We regularly import content from old web sites and systems. One recent client had thousands of documents that we needed to copy from the old site, so we wrote a scraping system to import the ones that fit a certain template into Drupal, and just copy the existing documents into sites/default/files.

Using the Filefield_sources module, you can associate an existing file with a filefield, using IMCE or other files uploaded through the file system. However, we hit a problem: if you try to browse to an existing file, Filefield returns an error when you try to refer to it:

 
The selected file could not be used because the file does not exist in the database.

It turns out that Drupal stores a reference to each file in its internal files table, and you cannot add filefield links to a file without it already existing there. Why doesn't Filefield_sources simply add a reference? Because the underlying Filefield might delete the file when its node gets deleted. It's a bit of a mess, discussed more fully here. On that page, there's a new file attach method that gives you a drop-down of files uploaded to a particular directory, but we've got thousands of legacy files scattered deep in a tree, and this approach didn't work for us.

So our solution was to simply load all the files in the system into the files table, where Filefield can recognize them. Seems to work great!

To do this, I created a simple Drush script, and put in a module in the site. For example purposes, this would be in a mycustom.drush.inc file inside sites/all/modules/mycustom:

<?php
/**
 * Provide module specific drush commands
 */
function mycustom_drush_command() {
  $items = array();
  $items['findpath'] = array(
    'description' => 'Search filesystem for files by path',
    'arguments' => array(
      'filepath' => 'Name of path to find.',
      'commit' => 'Save results to files table'
    ),
  );
  return $items;
}

/**
 * Drush command callback
 */
function drush_mycustom_findpath($filepath,$commit = false){
  $ar = file_scan_directory($filepath, '.*');
  foreach ($ar as $item){
    $file = new stdClass();
    $file->filename = $item->basename;
    $file->filepath = $item->filename;
    
    // look for file in {files}
    $result = db_query("SELECT * FROM {files} WHERE filepath LIKE '%s'", $file->filepath);
    if ($obj = db_fetch_object($result)){
      drush_log('Found file: '. $file->filename);
    } else {
      drush_log('File not found: '. $file->filename);
      $file->uid = 1;
      $file->filemime = file_get_mimetype($file->filename);
      $file->status = FILE_STATUS_PERMANENT;
      $file->filesize = filesize($file->filepath);
      
      if ($commit){
        drush_log('Saving file to database: '.$file->filename);
        drupal_write_record('files',$file);
      }
    }
  }
}

With that in place, Drush now recognizes a "drush findpath" command. This isn't very robust, but it does the trick...

To get a list of files and whether or not they are in the files table, you would change to the site root directory and run:

drush findpath --verbose sites/default/files

... this action does not actually do anything other than list each file it finds, along with whether it was found in the files database or not. To actually commit the files to the database, run:

drush findpath --verbose sites/default/files true

... and a minute or two later, you've got all the files added to the database!

 

Have you found a better way? Please let me know in the comments below...

Permalink

Hehe.. that's the perfect solution I was looking for. Actually, my latest client's website need to import his documents too but didn't have solution. I got same error stating
"The selected file could not be used because the file does not exist in the database."
And now I got solution. I will post another comment or expect a mail from me if I strike with another error.

Regards
Facebook Application Analytics

Permalink

That's a brilliant workaround! But since you mentioned 1,000's of files, how did you associate those files with filefield? My files are mostly PDFs but there thousands of them, so after scouring the net for import solutions, your approach seems perfect. I'm a bit of a drupal nube, so sorry if this is blindingly obvious.

thanks,
Kevin

Drush command works great, thanks!

But if you also use imce with filefield_sources to reference a file earlier uploaded with FTP, you also need to add the file to the imce_files table. (file id only)

something like:

db_query('INSERT INTO {imce_files} (fid) VALUES(%d)', $file->fid);

Or else you will get the dreadful "Referencing to the file used in the field is not allowed." error message.

Permalink

Your solution works great, however I've run into the "Referencing to the file used in the field is not allowed." error as mentioned above (using filefield_sources).

Albert, where would you add your line to the existing script?

Would be great if this was built into IMCE, so it would re-scan the files directory when its opened (and add appropriate file ids).

Permalink

FYI - Albert's line works well when inserted after:
drupal_write_record('files',$file);

However the script doesn't insert file id references into the 'imce_files' table when the file already exists in the 'files' table.

To workaround this, just empty the 'files' & 'imce_files' tables, and re-run the script.

Another solution to the "Referencing to the file used in the field is not allowed." is to have your drush script set a file status that isn't FILE_STATUS_PERMANENT.

In 'mycustom.module', add a line define('FILE_STATUS_UPLOADED', 3); (this is per D6 core's file.inc instructions on how to extend file statuses) -- then replace PERMANENT to UPLOADED in your drush command.

The result is that files recognized in this way won't be subject to filefield's preventing of attaching a file to a node when that file is in the db but isn't attached anywhere else (otherwise one could attach the file to a node, delete the node, and the file would be deleted though it might be in use somewhere other than a filefield).

Permalink

Thank`s for solved question. A little bit work and clients will stop to arguing)))
drush findpath --verbose sites/default/files
and all problems gone! Hell yeah)
Mike Portnoy - essay writers group

Permalink

Really interesting stuff.

I came here through the IMCE FTP import module at http://drupal.org/node/1260538&#13;

Any thoughts about getting this good for Drupal 7? I imagine it has a lot to do with the new conventions of file management in the core.

I'm running into a wall with a preg_match(): No ending delimiter '.' found file.inc:1996. Have you (or anyone) seen similar results?

Hi, Nate,

Just beginning to do D7 projects, but have you seen the new "Attach Files" functionality in filefield_sources? I think it provides similar functionality to what's posted in that issue -- the ability to designate a directory on the server that Filefield Sources can scan/import/copy files from. And I think I saw this working in D7 already. Haven't had to put it to use recently though...

Permalink

Having a problem with the script, running drush shows the command but trying to run it generates an error...

$ drush findpath --verbose sites/default/files
Initialized Drupal 7.8 root directory at htdocs [notice]
Initialized Drupal site default at sites/default [notice]
The drush command 'findpath sites/default/files' could not be found. [error]

Thanks so much for this. it all made sense.
I struggled for a while with filefield sources - but the "File attach allows for selecting a file from a directory on the server," settings or functionality just would not work for me - or were unfinished, or I couldn't understand the wording on the help page at all.

So I took your script above and did the necessary D7 things to it!

Placing this in my ~/.drush directory (I renamed it to 'findfiles) as findfiles.drush.inc - it behaves mostly the same as the original - though I made some assumption that the folder being imported was in fact the normal public files dir. This makes the original path as a parameter either unnecessary or wrong - fix it if you need to.

<?php /**
* Bulk operation to scan your files directory and ensure that every file has a
* corresponding entry in the 'managed files' table.
*
* Without it, you can't re-use files via filefield_sources as promised.
*
* Original D6 by John Locke on 02/23/2010
* From http://www.freelock.com/blog/john-locke/2010-02/using-file-field-imported-files-drupal-drush-rescue
*
* Upgraded to D7 by dman.
* For a less-naive solution to this problem
* (actually scan pages and attach the right files to individual nodes)
* @see http://drupal.org/project/file_ownage
*
*
* USAGE:
* Trial run:
* drush --verbose findpath sites/sitename/files
* Real run:
* drush findpath sites/sitename/files true
*
* BACKUP your DB first!
*/
#
/**
* Provide module specific drush commands
*/
function findfiles_drush_command() {
$items = array();
$items['findpath'] = array(
'description' =?????????????????????????????????????????> 'Search filesystem for files by path',
'arguments' => array(
'filepath' => 'Name of path to find.',
'commit' => 'Save results to files table'
),
);
return $items;
}
#
/**
* Drush command callback
*/
function drush_findfiles_findpath($scandir, $commit = false){
$ar = file_scan_directory($scandir, '@.*@');
foreach ($ar as $item){
$local_filepath = str_replace($scandir . '/', '', $item->uri);
// Need to think in file wrappers, from the beginning.
$file = new stdClass();
$file->fid = NULL;
// DO NOT USE $item->name as it truncates the suffix.
// Normally that would be nice but it cripples IMCE!!
$file->filename = basename($item->uri);
$file->uri = 'public://' . $local_filepath;
$file->filemime = file_get_mimetype($file->uri);
global $user;
$file->uid = $user->uid;
$file->status = FILE_STATUS_PERMANENT;
#
// Look for file in {file_managed} table.
drush_log("Checking db for {$file->uri}");
$result = db_query("SELECT * FROM {file_managed} WHERE uri = :uri", array(':uri' => $file->uri));
$record = NULL;
foreach ($result as $record) {
// Found at least one
drush_log("Found file: {$file->uri} fid:{$record->fid}");
}
#
if (!$record) {
drush_log('File not found in the database yet: '. $file->uri);
#
if ($commit){
drush_log('Saving file to database: '.$file->uri);
// Get file wrapper CRUD to save it for us
drupal_chmod($file->uri);
file_save($file);
// Other modules - specifically filefield_sources -
// May not play ball unless the file is 'in use' as well.
// @see file_managed_file_validate()
// @see file_usage_list($file);
// We don't have anything useful to tell it, about previous usage
// so just say it's managed by 'system'
file_usage_add($file, 'system', 'file', $file->fid);
}
}
}
}
?>

(had to add those # for formatting in this input filter - it killed my indents also?)

Permalink

Hi
Encoutered warnings :

session_start(): Cannot send session cookie - headers already sent by (output started at [warning]
/nfs/http6/..../drush/commands/fileindatabase.drush.inc:1) bootstrap.inc:1165
session_start(): Cannot send session cache limiter - headers already sent (output started at [warning]
/nfs/http6/..../drush/commands/fileindatabase.drush.inc:1) bootstrap.inc:1165
Cannot modify header information - headers already sent by (output started at [warning]
/nfs/http6/..../drush/commands/fileindatabase.drush.inc:1) bootstrap.inc:729
Cannot modify header information - headers already sent by (output started at [warning]
/nfs/http6/..../drush/commands/fileindatabase.drush.inc:1) bootstrap.inc:730
Cannot modify header information - headers already sent by (output started at [warning]
/nfs/http6/.../drush/commands/fileindatabase.drush.inc:1) bootstrap.inc:731
Cannot modify header information - headers already sent by (output started at [warning]
/nfs/http6/.../drush/commands/fileindatabase.drush.inc:1) bootstrap.inc:732

is it important to fix it ?

thanks

Add new comment

The content of this field is kept private and will not be shown publicly.

Filtered HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a href hreflang> <em> <strong> <blockquote cite> <cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h1> <h2 id> <h3 id> <h4 id> <h5 id> <p> <br> <img src alt height width>
  • Lines and paragraphs break automatically.