Media Migration Tip in Drupal

By Kevin, July 8th, 2014

If you’re doing a migration of media files, you most likely will be working with a list of URLs. Other times, you will have a local file system from which to pull in media. When working with just a list of URLs though, you’re somewhat working with a ‘blind’ import.

Imported content may reference images or files as anchor or image tags, but you really don’t know if they exist or not. If you are migrating a large site, there is a very good chance there are some forgotten pages in the system that makes use of file URLs that have been long deleted.

Fortunately, we can do a quick check before processing the record to make sure that the file does in fact exist before we attempt to download and save it in the media system. By using a custom method and implementing prepareRow(), we can skip ‘bad’ rows entirely:


/**
 * Implements method prepareRow().
 * @param $row
 * @return bool
 */
public function prepareRow($row) {
  if (!$this->fileExists($row->url)) {
    return FALSE;
  }
}

/**
 * Check to see if the physical file even exists before we attempt to migrate it in.
 * @param $url
 * @return bool
 */
private function fileExists($url) {
  $curl = curl_init($url);
  curl_setopt($curl, CURLOPT_NOBODY, true);
  $result = curl_exec($curl);
  curl_getinfo($curl, CURLINFO_HTTP_CODE);
  curl_close($curl);
  return ($result && $status == 200) ? TRUE : FALSE;
}

cURL can quickly tap the supplied URL and return the status code to us. Setting the option of CURLOPT_NOBODY instructs cURL to not return the body with the output so we don’t task the remote server just checking to see the file exists.

If the cURL status returns anything but 200 OK, we then return FALSE in the prepareRow() method. Migrate allows you to skip rows in the source in this manner, allowing us to more smartly import media URLs.

After migrating, you can use a tool like LinkChecker to comb your content and report back broken file links to create a report out of and address at that point in time. This way, you don’t wind up with a migration pointing to non-existent files and dirty database records in the destination.