Managing Django Media & Static Files on Heroku with Bucketeer

Daniel Starner - Jan 13 '22 - - Dev Community

This article will walk through how we correctly persist static & media files for a Django application hosted on Heroku. As a bonus, it will also explain how we can satisfy the additional constraint of specifying private versus public media files based on model definitions.

Before I begin, this post extends from this TestDriven.io article that was written awhile back. I frequent it often when setting up my projects, and have built some extra functionality on top of it over the years. I decided to create a more focused post that references Heroku & Bucketeer with these extra features after helping an individual on StackOverflow.

I think it's because I turn off a PC, where I took these images

This probably is not it, because Heroku doesn't have access to the files on your computer.


When you upload a file to the Django admin, it looks at the DEFAULT_FILE_STORAGE settings configuration to determine how to…

So without further ado, let's first dive into what static & media files are and how Heroku dynos manage their filesystem?

What are Media & Static Files

If you are working with a Django project, then you inevitably have all of your Python application code written around a bunch of .py files. These are the code paths of your application, and the end-user - hopefully - never actually sees these files or their contents.

Outside of these business-logic files, it is common to serve users directly from your server's file system. For these static files, Django doesn't need to run any code for them; the framework looks up the file and returns the contents for the requesting user to view.

Some examples of static files include:

  • Non-templated HTML
  • CSS & JavaScript files to make your page look nice
  • User profile pictures
  • Generated PDFs

Media files in Django are a particular variant of static files. Media files are read from the server's file system as well. Unlike static files, though, they are usually generated files uploaded by users or generated by your application and are associated with a model's FileField or ImageField. In the examples above, user profile pictures and generated PDFs are typical examples of media files.

Django with Media & Static Files

When a new media file is uploaded to a Django web application, the framework looks at the DEFAULT_FILE_STORAGE settings configuration to determine how to store that file. By default, it uses the django.core.files.storage.FileSystemStorage class, which is what most projects start off as having configured. This implementation looks at the MEDIA_ROOT configuration that is defined in the settings.py file and copies the uploaded file contents to a deterministically-created file path under that given MEDIA_ROOT.

For example, if the MEDIA_ROOT is set as /var/www/media, all uploaded files will be copied and written to a location under /var/www/media/.

Heroku with Media & Static Files

Storing these static files on your server's disk file system is okay until you start to work with a containerization platform such as Heroku. To explain why this is the case, it helps to take a step back.

When downloading files on your personal computer, it's okay that these get written to the file system - usually under ~/Downloads or somewhere similar. This download is because you expect your computer's file system to persist across restarts and shutdowns; if you download a file and restart your computer, that downloaded file should still be there once the laptop is finished restarting.

Heroku uses containerization to execute customer workloads. One fact of this environment is that the associated file systems do not persist across restarts and reschedules. Heroku dynos are ephemeral, and they can be destroyed, restarted, and moved without any warning, which replaces the associated filesystem. This situation means that any uploaded files referenced by FileField's andImageField's are just deleted without a trace every time the dyno is restarted, moved, or scaled.


Complete Example Codebase

I will be stepping through the process of configuring the Django application for Heroku & S3-compatible storage, but feel free to reference the repository below for the complete code to browse through.

GitHub logo dstarner / django-heroku-static-file-example

Used in my blog post of detailing private & public static files for a Heroku-served Django application

Properly Managing Django Media & Static Files on Heroku Example

Used in my blog post of detailing private & public static files for a Heroku-served Django application.

Note: This does include a $5.00 / month Bucketeer add-on as a part of the one-click deployment.

Deploy




Bootstrapping Django on Heroku

This tutorial aims to help you retrofit an existing Django project with S3-compatible storage, but I'll quickly go through the steps I used to set up the example Django application. It may help those new to Django & Heroku or those who encounter bugs following the rest of the setup process.

You can view the tagged project before the storage change at commit 299bbe2.

  • Bootstrapped a Django project example
    • Uses poetry for dependency management
    • All of the Django code is under the example package, and the manage.py file is in the root. I've always found this structure cleaner than the Django apps defined in the project root.
  • Configured the project for Heroku
    • django-heroku package to automatically configure ALLOWED_HOSTS, DATABASE_URL, and more. This reduces the headache of deploying Django on Heroku considerably
    • A Procfile that runs a gunicorn process for managing the WSGI application
    • An app.json is defined with some fundamental configuration values and resources defined for the project to work
    • A release process definition in the Procfile and an associated scripts/release.sh script that runs staticfile collection and database migrations

Introducing Heroku's Bucketeer Add-On

Before we can start managing static and media files, the Django application needs a persistent place to store the files. Again, we can look to Heroku's extensive list of Add-Ons for s3-compatible storage. Ours of choice will be one called Bucketeer.

Heroku's Bucketeer add-on provides an AWS S3 storage bucket to upload and download files for our application. The Django application will use this configured bucket to store files uploaded by the server and download them from the S3 when a user requests the files.

If you'd like to learn more about AWS S3, the widely-popular data storage solution that Bucketeer is built upon, you can read the S3 user documentation.

It is worth mentioning that the base plan for Bucketeer - Hobbyist - is $5 per month. If you plan on spinning up the one-click example posted above, it should only cost a few cents if you proactively destroy the application when you are done using it.

Including the Bucketeer Add-On

To include the Bucketeer add-on in our application, we can configure it through the Heroku CLI, web dashboard, or via the project's app.json file. We will use the third method of including the add-on in an app.json file.

If the project does not have one already, we can create the basic structure listed below, with the critical part being the addition of the "add-ons" configuration. This array defines the "bucketeer:hobbyist" resource that our application will use, and Heroku will install the add-on into our application if it does not already exist. We also include the " as" keyword, which will preface the associated configuration variables with the term BUCKETEER. This prefacing is helpful to keep the generated configuration value names deterministic because, by default, Heroku will generate the prefix as a random color.



{
    // ... rest above
    "addons": [
        // ...other addons...
        {
            "plan": "bucketeer:hobbyist",
            "as": "BUCKETEER"
        }
    ]
}


Enter fullscreen mode Exit fullscreen mode

With the required resources being defined, we can start integrating with our storage add-on.

Implementing Our Storage Solution

The django-storages package is a collection of custom, reuseable storage backends for Django. It aids immensely in saving static and media files to different cloud & storage provider options. One of the supported storage providers is S3, which our Bucketeer add-on is built on. We will leverage the S3 django-storages backend to handle different file types.

Installing django-storages

Begin by installing the django-storages package and the related boto3 package used to interface with AWS's S3. We will also lock our dependencies to ensure poetry and our Heroku deployment continue to work as expected.



poetry add django-storages boto3 && poetry lock


Enter fullscreen mode Exit fullscreen mode

Then, just like most Django-related packages, django-storages will need to be added to the project's INSTALLED_APPS in the projects settings.py file. This will allow Django to load the appropriate code flows as the application starts up.



# example/config/settings.py
INSTALLED_APPS = [
    # ... django.X.Y apps above
    'storages',
    # ... custom project apps below
]


Enter fullscreen mode Exit fullscreen mode

Implementing Static, Public & Private Storage Backends

We will return to the settings.py file later to configure the usage of django-storages, but before that can be done, we will implement three custom storage backends:

  • A storage backend for static files - CSS, Javascript, and publicly accessible images - that will be stored in version control - aka git - and shipped with the application
  • A public storage backend for dynamic media files that are not stored in version control, such as uploaded files and attachments
  • A private storage backend for dynamic media files that are not stored in the version control that require extra access to be viewed, such as per-user reports and potentially profile images. Files managed by this backend require an access key and will block access to those without a valid key.

We can extend from django-storages 's S3Boto3Storage storage backend to create these. The following code can be directly "copy and paste "'d into your project. The different settings attributes read in the module will be written shortly, so do not expect this code to work if you import it right now.



# FILE: example/utils/storage_backends.py

from django.conf import settings
from storages.backends.s3boto3 import S3Boto3Storage


class StaticStorage(S3Boto3Storage):
    """Used to manage static files for the web server"""
    location = settings.STATIC_LOCATION
    default_acl = settings.STATIC_DEFAULT_ACL


class PublicMediaStorage(S3Boto3Storage):
    """Used to store & serve dynamic media files with no access expiration"""
    location = settings.PUBLIC_MEDIA_LOCATION
    default_acl = settings.PUBLIC_MEDIA_DEFAULT_ACL
    file_overwrite = False


class PrivateMediaStorage(S3Boto3Storage):
    """
    Used to store & serve dynamic media files using access keys
    and short-lived expirations to ensure more privacy control
    """
    location = settings.PRIVATE_MEDIA_LOCATION
    default_acl = settings.PRIVATE_MEDIA_DEFAULT_ACL
    file_overwrite = False
    custom_domain = False


Enter fullscreen mode Exit fullscreen mode

The attributes listed in each storage backend class perform the following:

  • location: This dictates the parent directory used in the S3 bucket for associated files. This is concatenated with the generated path provided by a FileField or ImageField 's upload_to method.
  • default_acl: This dictates the access policy required for reading the files. This dictates the storage backend's access control through values of None, public-read, and private. django-storages and the S3Boto3Storage parent class with translate these into object policies.
  • file_overwrite: In most cases, it's better not to overwrite existing files if we update a specific path. With this set to False, a unique suffix will be appended to the path to prevent naming collisions.
  • custom_domain: Disabled here, but you can enable it if you want to use AWS's CloudFront and django-storage to serve from it.

Configure Settings to Use the Storage Backends

With our storage backends defined, we can configure them to be used in different situations via the settings.py file. However, it is challenging to use S3 and these different cloud storage backends while in development, and I've always been a proponent of keeping all resources and files "local" to the development machine, so we will create a logic path that will:

  1. Use the local filesystem to store static and media files for convenience. The Django server will be responsible for serving these files directly.
  2. Use the custom S3 storage backends when an environment variable is enabled. We will use the S3_ENABLED variable to control this, enabling it in our Heroku configuration variables.

First, we will assume that you have a relatively vanilla settings.py file concerning the static- & media-related variables. For reference, a new project should have a block that looks similar to the following:



# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/4.0/howto/static-files/

STATIC_URL = 'static/'

STATIC_ROOT = BASE_DIR / 'collected-static'


Enter fullscreen mode Exit fullscreen mode

We will design a slightly advanced control flow that will seamlessly handle the two cases defined above. In addition, it will provide enough control to override each part of the configuration as needed.

Since there are already default values for the static file usage, we can add default values for media file usage. These will be used when serving files locally from the server while in development mode.



STATIC_URL = '/static/'
STATIC_ROOT = BASE_DIR / 'collected-static'

MEDIA_URL = '/media/'
MEDIA_ROOT = BASE_DIR / 'collected-media'


Enter fullscreen mode Exit fullscreen mode

To begin the process of including S3, let's create the controls to manage if we should serve static & media files from the local server or through the S3 storage backend. We will create three variables

  • S3_ENABLED: controls whether media & static files should use S3 storage by default
  • LOCAL_SERVE_MEDIA_FILES: controls whether media files should use S3 storage. Defaults to the negated S3_ENABLED value
  • LOCAL_SERVE_STATIC_FILES: controls whether static files should use S3 storage. Defaults to the negated S3_ENABLED value


from decouple import config  # import explained below

# ...STATIC and MEDIA settings here...

# The following configs determine if files get served from the server or an S3 storage
S3_ENABLED = config('S3_ENABLED', cast=bool, default=False)
LOCAL_SERVE_MEDIA_FILES = config('LOCAL_SERVE_MEDIA_FILES', cast=bool, default=not S3_ENABLED)
LOCAL_SERVE_STATIC_FILES = config('LOCAL_SERVE_STATIC_FILES', cast=bool, default=not S3_ENABLED)

if (not LOCAL_SERVE_MEDIA_FILES or not LOCAL_SERVE_STATIC_FILES) and not S3_ENABLED:
    raise ValueError('S3_ENABLED must be true if either media or static files are not served locally')


Enter fullscreen mode Exit fullscreen mode

In the example above, we are using the python-decouple package to make it easier to read and cast environment variables to Python variables. I highly recommend this package when working with settings.py configurations. We also include a value check to ensure consistency across these three variables. If all three variables are defined in the environment but conflict with one another, the program will throw an error.

We can now start configuring the different configuration variables required by our file storage backends based on those control variables' value(s). We begin by including some S3 configurations required whether we are serving static, media, or both types of files.



if S3_ENABLED:
    AWS_ACCESS_KEY_ID = config('BUCKETEER_AWS_ACCESS_KEY_ID')
    AWS_SECRET_ACCESS_KEY = config('BUCKETEER_AWS_SECRET_ACCESS_KEY')
    AWS_STORAGE_BUCKET_NAME = config('BUCKETEER_BUCKET_NAME')
    AWS_S3_REGION_NAME = config('BUCKETEER_AWS_REGION')
    AWS_DEFAULT_ACL = None
    AWS_S3_SIGNATURE_VERSION = config('S3_SIGNATURE_VERSION', default='s3v4')
    AWS_S3_ENDPOINT_URL = f'https://{AWS_STORAGE_BUCKET_NAME}.s3.amazonaws.com'
    AWS_S3_OBJECT_PARAMETERS = {'CacheControl': 'max-age=86400'}


Enter fullscreen mode Exit fullscreen mode

The above defines some of the variables required by the django-storages S3 backend and sets the values to environment configurations that are provided by the Bucketeer add-on. As previously mentioned, all of the add-on environment variables are prefixed with BUCKETEER_. The S3_SIGNATURE_VERSION environment variable is not required and most likely does not need to be included.

With the S3 configuration together, we can reference the LOCAL_SERVE_MEDIA_FILES and LOCAL_SERVE_STATIC_FILES control variables to override the default static and media file settings if they are desired to be served via S3.



if not LOCAL_SERVE_STATIC_FILES:
    STATIC_DEFAULT_ACL = 'public-read'
    STATIC_LOCATION = 'static'
    STATIC_URL = f'{AWS_S3_ENDPOINT_URL}/{STATIC_LOCATION}/'
    STATICFILES_STORAGE = 'example.utils.storage_backends.StaticStorage'


Enter fullscreen mode Exit fullscreen mode

Notice the last line where STATICFILES_STORAGE is set to the custom Backend we created. That ensures it follows the location & ACL (Access Control List) policies that we configured initially. With this configuration, all static files will be placed under /static/ in the bucket, but feel free to update STATIC_LOCATION if desired.

We can configure a very similar situation for media files.



if not LOCAL_SERVE_MEDIA_FILES:
    PUBLIC_MEDIA_DEFAULT_ACL = 'public-read'
    PUBLIC_MEDIA_LOCATION = 'media/public'

    MEDIA_URL = f'{AWS_S3_ENDPOINT_URL}/{PUBLIC_MEDIA_LOCATION}/'
    DEFAULT_FILE_STORAGE = 'example.utils.storage_backends.PublicMediaStorage'

    PRIVATE_MEDIA_DEFAULT_ACL = 'private'
    PRIVATE_MEDIA_LOCATION = 'media/private'
    PRIVATE_FILE_STORAGE = 'example.utils.storage_backends.PrivateMediaStorage'


Enter fullscreen mode Exit fullscreen mode

The big difference here is that we have configured two different storage backends for media files; one for publicly accessible objects and one for objects that require an access token. When the file is requested, this token will be generated internally by django-storages so you do not have to worry about anonymous public access.

Local Development Serving

Since we will have S3_ENABLED set to False in our local development environment, it will serve static and media files locally through the Django server instead of from S3. We will need to configure the URL routing to handle this scenario. We can configure our urls.py file to serve the appropriate files like so:



from django.conf import settings
from django.conf.urls.static import static
from django.contrib import admin
from django.urls import path


urlpatterns = [
    path('admin/', admin.site.urls),
]

if settings.LOCAL_SERVE_STATIC_FILES:
    urlpatterns += static(settings.STATIC_URL, document_root=settings.STATIC_ROOT)

if settings.LOCAL_SERVE_MEDIA_FILES:
    urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)


Enter fullscreen mode Exit fullscreen mode

This will locally serve the static or media files based on the values of the LOCAL_SERVE_STATIC_FILES and LOCAL_SERVE_MEDIA_FILES settings variables we defined.

Enabling S3 Storage

We can enable these storages and our add-on in the app.json file to start using these storage backends. This will effectively disable LOCAL_SERVE_STATIC_FILES and LOCAL_SERVE_MEDIA_FILES to start serving both via S3 when deployed to Heroku.



{
  // ...rest of configs...
  "env": {
    // ...rest of envs...
    "S3_ENABLED": {
      "description": "Enable to upload & serve static and media files from S3",
      "value": "True"
    },
  }
}


Enter fullscreen mode Exit fullscreen mode

Using the Private Storage

By default, Django will use the PublicMediaStorage class for uploading media files, meaning the contents will be publicly accessible to anyone with the link. However, a model can utilize the PrivateMediaStorage backend when desired, which will create short-lived access tokens that prevent the public from viewing the associated object.

The below is an example of using public and private media files on the same model.



from django.db import models

from example.utils.storage_backends import PrivateMediaStorage


class Organization(models.Model):
    """A sample Organization model with public and private file field usage
    """

    logo = models.ImageField(help_text='A publicly accessible company logo')

    expense_report = models.FileField(
        help_text='The private expense report requires a short-lived access token'
        storage=PrivateMediaStorage()  # will create private files
    )


Enter fullscreen mode Exit fullscreen mode

You can see the code for this complete example at commit 265becc. This configuration will allow your project to scale efficiently using Django on Heroku using Bucketeer.

In a future post, we will discuss how to upload and set these files using vanilla Django & Django REST Framework.

As always, if you find any bugs, issues, or unclear explanations, please reach out to me so I can improve the tutorial & experience for future readers.

Take care everyone

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .