3 min read

Dealing with large filesizes in Drupal

Dealing with large filesizes in Drupal

Storing large filesizes

I'm currently writing a module in my spare time with the emphasis on providing the integration of Drupal and seedboxes. More a proof of concept module than anything else. In doing so, I've ran into a couple of limitations of Drupal when it comes to storing files of large size.

The first limitation is storing filesizes greater than 2GB. The system module within its hook_schema declaration for the file_managed table casts the type of field to be that of 'INT':

'filesize' => array(  'description' => 'The size of the file in bytes.', 'type' => 'int', 'unsigned' => TRUE, 'not null' => TRUE, 'default' => 0, ),

For general use this isn't too much of an issue, however since MySQL's signed INT fields have a maximum value of 2147483647 bytes, trying to put any value larger will result in an error. This occurs whenever file_save occurs with a $file->filesize larger than the threshold.

It's not too hard to change this but we must change it in the correct fashion so it does not get overridden in future and so the rest of the system is aware of what we've done.

Within the module I am writing I've added a few things to hook_install and hook_uninstall as well as putting in a hook_schema_alter

/**
 * Implements hook_install()
 */
function seedbox_install() {
  db_change_field('file_managed', 'filesize', 'filesize', array('type' =>   'int', 'size' => 'big',));
 }
 
 /**
  * Implements hook_uninstall()
  */
function seedbox_uninstall() {
  db_change_field('file_managed', 'filesize', 'filesize', array('type' => 'int', 'size' => 'normal',));
}

/**
 * Implements hook_schema_alter
 */
function seedbox_schema_alter(&$schema) {
  if (isset($schema['file_managed'])) {
    $schema['file_managed']['fields']['filesize'] = array('type' => 'int', 'size' => 'big', );
  }
}

The first declaration has the effect of changing the structure of the filesize field from INT to BIGINT, hence raising the largest storable value to around 9.2EB which should be able to cater for files way into the future. We also must change the field back when we uninstall the module which is where hook_uninstall comes in.

The hook_schema_alter is simply a polite way of letting other modules know what we've done!

Downloading large files

Unless you're lucky enough to have gigabit speed internet to your home, or unless you're storing large files in Drupal on a LAN the connection speed is likely to be limited. In a private file system, any download is done so through PHP which has a maximum execution time. I was finding that when attempting to download large test files after a period of a few minutes the download would cancel with the rather confusing error of 'size mismatch'. A little bit more looking through logs and file.inc in general revealed the most likely cause was PHP execution time being exceeded.

I didn't want to change the execution time globally as this could have further repercussions on other sites on the same server or just on the operating of the site in general. The other option was to amend file_transfer and place a set_time_limit(0); directly before the file was transferred to the user. Since hacking core is hacking core I decided to find the relevant hook and place my declaration there.

The reason I limited to the streamwrapper I implemented was to limit the number of times PHP's execution limit was removed.

/**
 * Implements hook_file_download()
 * Due to the large size of some files it is necessary
 * to remove the restriction imposed by PHP on the length
 * of time it takes to execute this transaction.
 * Currently set to unlimited time, this could be altered
 * in an admin interface potentially
 */
 function seedbox_file_download($uri) {
   if (file_uri_scheme($uri) == 'seedboxdownload') { 
     drupal_set_time_limit(0);
   }
 }
 
 /**
 * Implements hook_file_download()
 * Due to the large size of some files it is necessary
 * to remove the restriction imposed by PHP on the length
 * of time it takes to execute this transaction.
 * Currently set to unlimited time, this could be altered
 * in an admin interface potentially
 */
 function seedbox_file_download($uri) {
   if (file_uri_scheme($uri) == 'seedboxdownload') {    
     drupal_set_time_limit(0);
   }
 }

I generated files using the dd command, specifically:

dd if=/dev/zero of=filename bs=1 count=1 seek=1048575