This is a timeless subject and one I'm pretty passionate about. There are many reasons you might want to offload your file storage from the local file system and onto Amazon S3. I will cover two implementation methods in this post, and some of the reasons why you would want to integrate S3 with your Drupal application. I will never get sick of talking about best practices for building scalable and performant Drupal Applications.
What is Amazon S3?
Amazon themselves define it as a secure, durable, highly-scalable object storage, thats important because it does one thing, and it does it well.
Amazon S3 is easy to use, with a simple web services interface to store and retrieve any amount of data from anywhere on the web.
Amazon S3 stores data as objects within resources called "buckets." You can store as many objects as you want within a bucket, and write, read, and delete objects in your bucket. Objects can be up to 5 terabytes in size. You can control access to the bucket (who can create, delete, and retrieve objects in the bucket for example), view access logs for the bucket and its objects, and choose the AWS region where a bucket is stored to optimize for latency, minimize costs, or address regulatory requirements.
So now you sort of understand what it does, now lets look at why this is such a great thing.
Why you might use Amazon S3 with Drupal:
Not having enough file storage or data transfer available on your hosting provider is a good reason. Perhaps you need to store and share very large files, or maybe your hosting provider charges a premium for overages past a certain amount of file storage and data transfer. On the other hand you may have too many files on the local storage and traversing this has become problematic and a cumbersome operation for your application. File systems do not handle enormous amounts of files, especially if they are not organized well or if you are doing large amounts of file operations.
I should mention that Amazon S3 is the most stable of the AWS platform, they boast this themselves, not to mention that monthly rates for data storage with Amazon S3 is incredibly affordable. $.15 per gigabyte of data transfer a month with no limits on overall storage. So, if your data needs are high enough, it might be more cost-effective to host your files on S3.
Another reason, is you might want to have a separate file storage areas for different environments, such as: production, staging, load testing and development. It is a reasonable concern to not want non-prod environments to serve from the production bucket, but at the same time there may be no good reason not to share them across the mentioned environments, after all S3 is pretty reliable and robust. Regardless of your implementation needs we have a couple of options to discuss. Bottom line, getting files off your local file system is going to save you from headaches in one way or another.
There are two different popular modules you can use to integrate S3 with your Drupal application, for public file storage purposes.
If you are running on Drupal 7 you can take advantage of the AmazonS3 module to integrate S3 storage with your Drupal application. This module has been around for a long time and has helped many to integrate S3. As per the module description:
The AmazonS3 module allows the local file system to be replaced with S3. Uploads are saved into the Drupal file table using D7's file/stream wrapper system.
This is an important technical situation to consider, this is how any module working with files should operate, at the file/stream wrapper layer. If you look at a module and its doing something else, you are going to have a bad time. It's a well put together module, actively maintained and has come a long way in the recent years. Read the instructions and you will note that there are dependencies on the Drupal Libraries API and the AWS SDK for PHP. Make sure you check out the issue queue if you find something isn't working as it should.
Module configuration process:
- After installing the necessary modules and SDK, you need to input your security credentials from Amazon. You'll need both your Access Key ID and your Secret Access Key. This will enable the AmazonS3 module to access your information in S3. After logging in you can find them at the Security Credentials page.
- Set the default S3 bucket in which to upload files. You will need to have created this bucket already in AWS, using the Amazon AWS dashboard.
- Configure the File System settings inside your Drupal administration dashboard to push uploads to S3. You'll select the option to have your file uploads go to the S3 bucket, as opposed to local default/files. That menu is found at - example.com/admin/config/media/file-system
- Check your content types and the fields to ensure they are going to end up in the right place, that right place being S3.
Keep in mind:
Existing files will need to be migrated over to S3, right now they are on your disk, which isn't very helpful. All new content should be good but you will need to plan to migrate over those existing files. There are some helper tools to accomplish this specifically s3cmd. Once the file has been copied over you can modify in the Database, the file reference to point to "s3:...", this is a little more advanced but you are a grown up and can handle this task responsibly, right?
Take a look at the Storage API module to integrate S3 storage with your Drupal application. I personally really like this module, as it provides a cleaver layer to abstract the storage of files for your site, while making it easy to shuttle off images and such to different containers for specific content types. This module doesn't cater just for S3, it does many storage integrations, you can even define your own! I've pretty much reiterated what they state on the module page:
Storage API is a low-level framework for managed file storage and serving. Module and all the core functions will remain agnostic of other modules in order to provide this low-level functionality.
While this module appears to be a little more complex, its actually not. It has incredible flexibility around where you store your files and the order in which you serve these from. It will also manage keeping all of the destinations in sync in the background so you can focus on making your website succeed, get a promotion and enjoy longer vacation time.
One thing this module excels at is allowing for flexible workflows. Picture this. You have a new feature you need to roll out, the code is pushed to staging. You don't want to sync your production with your staging bucket, nor do you want to write to production in testing the new functionality. With StorageAPI you can configure your site to use the production bucket to serve existing assets while new content goes to the staging bucket or the local file system. When the page is rendered assets are served from their respective locations. In this scenario you have leveraged your productions assets all the while preserving its integrity. You have tested the new functionality, seen it has worked and can confidently push it live.
Module configuration process:
- You will need to create a new storage container, then create a storage class and assign them as appropriate to your content types.
- Create a new storage container, and set it to Amazon S3. You will need to provide an S3 bucket name and your AWS security access ID and secret key. After logging in you can find them at the Security Credentials page.
- Create a new storage class, set the initial storage location (keep it local to reduce lag for users) and then set the final storage location, which in this case will be that new S3 storage container we just set up.
- Be sure to go back and update your content types so that their public download destination is assigned to the appropriate storage class, in this case the one for S3.
The reason you want the initial storage location to be local, is that there is a slight lag time when files are being uploaded, image-cached, and then transferred to S3. If you have lots of users trying to upload images at the same time this can be processing intensive. By having the initial upload local, you avoid this, and then when cron runs later Storage API will automatically transfer the file to S3 and then update the associations.
As you can see there is no reason to be afraid of present technologies such as Amazon S3, in fact it should feel natural in this day and age to get stuck in. There are plenty of mature methods of implementing this feature to your Drupal application, so there should be no reason whatsoever that you have to write a module yourself. Seriously. Don't even think about it.
Be kind to the community, give back your patches and make the experience better for everyone. Both of the modules referenced in this post there are issues in the queue that need solved. If you fix something you will get the credit you deserve, that's how the Drupal community rolls.