How Django uses deferred imports to scale

Django's smart use of Python's importlib

This post aims to dispel ambiguities introduced by Django for those learning it. These things initially perplexed me. I found it hard to grasp and articulate, and for my first years in Python, they were akin to magic.

The idea of using strings in Django settings and having them resolving python code seemed very indirect. How did that work? Was it a feature of python itself? Or was it some Rails-like meta programming sorcery? Is the notion from The Zen of Python of "Explicit is better than implicit" be challenged?

The concept of resolving strings into evaluated source code reaches beyond Django. For instance, enabling extensions in Sphinx. Even command line interfaces in standard Python, easily-accessible as $ python -m in PEP 338 or unittest's CLI interface.

This article tries to document where these conditions arise, so we how to distinguish where and when we see them, especially in django. Finally, we'll look into how it works underneath the hood in terms of the broader python language, and demonstrate something useful with it and relase it into open source.

"Import strings" have a lot of useful applications. I'd call them a necessity in a framework like Django, or else there'd be race conditions and circular dependencies. Django's loading of settings, applications and models is actually rather intricate, and in my opinion, well-executed.

We've all seen INSTALLED_APPS and through this, we can declare python string literals that later load applications. To clarify specific examples of Django's extensive usage of import strings, let's try to document examples.

Django's string imports

There are also other settings in Django that load modules, classes, and functions via strings:

DJANGO_SETTINGS_MODULE

The first and most famous import string in Django is DJANGO_SETTINGS_MODULE. This is imported via importlib.import_module() in django/conf/__init__.py.

The string you use for it loads a python module, which equates to file. If you have the current directory in your [site-packages/](https://docs.python.org/3/library/site.html], and your settings are at project/settings/local.py, then your DJANGO_SETTINGS_MODULE should be set to DJANGO_SETTINGS_MODULE=project.settings.local.

Settings variables

There are accessiable via attributes of the django.conf.settings during runtime.

variable / example of import string

INSTALLED_APPS .. code-block:: python

INSTALLED_APPS = ['path.to.myapp']

ROOT_URLCONF

ROOT_URLCONF = 'myapp.urls'

WSGI_APPLICATION: .. code-block:: python

WSGI_APPLICATION = 'develtech.wsgi.application'

STATICFILES_FINDERS

STATICFILES_FINDERS = ('django.contrib.staticfiles.finders.FileSystemFinder',)

AUTH_USER_MODEL

This isn't a "pure" import string. This works via django:django.apps.apps.get_model

AUTH_USER_MODEL = 'user.User'

MIDDLEWARE

MIDDLEWARE = (
   'django.contrib.sessions.middleware.SessionMiddleware',
)

TEMPLATES Inside of BACKEND and OPTIONS['context_processors']

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.request'
            ],
        },
    },
]

``AUTHENTICATION_BACKENDS``

AUTHENTICATION_BACKENDS = [
    'guardian.backends.ObjectPermissionBackend',
    'allauth.account.auth_backends.AuthenticationBackend',
]

STATICFILES_STORAGE

STATICFILES_STORAGE = 'django.core.files.storage.FileSystemStorage'

See source code of FileSystemStorage

EMAIL_BACKEND :

EMAIL_BACKEND ='django.core.mail.backends.smtp.EmailBackend'

URL Routes

You've probably seen that ROOT_URLCONF is itself an import string, but the code inside urls.py files also uses them.

First, let's do a real object example:

from django.urls import include, re_path
from django.contrib.auth.views import logout

urlpatterns = [
    re_path(r'^logout/', logout, name='logout', kwargs={'next_page': '/'}),
]

Django's route system also allow use of import strings via include(), which allows import strings to url python files (with a urlpatterns inside them).

from django.urls import include, re_path

urlpatterns = [
    re_path(r'^accounts/', include('allauth.urls')),
]

Where allauth.urls is allauth/urls.py

And also when you're declaring error pages in urls.py, such as django:django.conf.urls.handler404:

handler404 = 'based.django.views.errors.page_not_found'

Models

The next place you'll see string references to objects is in relational models, such as django.db.models.ForeignKey. Here is an excerpt taken directly from Django's documentation:

from django.db import models

class Car(models.Model):
    manufacturer = models.ForeignKey(
        'Manufacturer',
        on_delete=models.CASCADE,
    )

This establishing a relationship with a class Manufacturer.

Template engine

Template's are probably the most intricate usage of import strings.

https://docs.djangoproject.com/en/4.0/ref/templates/api/#loader-types

Template tags

Here is an example taken from the django documentation (concatenated so you can see both import string cases):

Engine(
    libraries={
        'myapp_tags': 'path.to.myapp.tags',
        'admin.urls': 'django.contrib.admin.templatetags.admin_urls',
    },
    builtins=['myapp.builtins'],
)

This is what it looks like when Django configures a template engine internally, e.g. path.to.myapp.tags, which are available automatically when you use that template engine. But more often, engines are configured declaratively via a settings file:

TEMPLATES = [{
    'BACKEND': 'django.template.backends.django.DjangoTemplates',
    'OPTIONS': {
        'libraries':{
            'myapp_tags': 'path.to.myapp.tags',
            'admin.urls': 'django.contrib.admin.templatetags.admin_urls',
        },
        'builtins': ['myapp.builtins'],
    },
}]

Despite the oppurtunity to have custom tags builtin like django's default tags (added in django/template/engine.py), it's more common to for Django developers to opt-in to loading template tags via {% load libraryname %} tag that picks up files inside the templatetags/ directory of applications, or the key used in OPTIONS['libraries'].

Dig deeper into Django's template engine

To dig deeper into Django's templates, I recommend django/templates/base.py

Check out {% load %} template tag on GitHub. This adds the library to a registry, eventually down the line it's loaded via import_library.

Context processors

Context processors allow information to be added to the request object. For the node.js programmers out there, these are sort of like passing contextual information passed through Express middleware.

In Django settings:

TEMPLATES = [{
    'BACKEND': 'django.template.backends.django.DjangoTemplates',
    'OPTIONS': {
        'context_processors': [
            'django.template.context_processors.request'
        ],
    },
}]

In Django plugins

One of the reasons import strings are used is it also makes third party extensions easier to implement.

Wherever an import string is used, Django settings can also have third party-plugins that fit the interface/class. For a first one, EMAIL_BACKEND supports third party extensions, like anymail/django-anymail (which I recommend!)

variable project + example of import string EMAIL_BACKEND

anymail/django-anymail

EMAIL_BACKEND = 'anymail.backends.mandrill.EmailBackend'

STATICFILES_STORAGE

jschneier/django-storages

STATICFILES_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'

TEMPLATES

nigma/django-easy-pjax (example taken from README):

TEMPLATES=[
    {
        "BACKEND": "django.template.backends.django.DjangoTemplates",
        "DIRS": [...],
        "APP_DIRS": True,
        "OPTIONS": {
            "builtins": [
                "easy_pjax.templatetags.pjax_tags"
            ],
            "context_processors": [
                "django.template.context_processors.request",
                "django.template.context_processors.static",
                # ...
            ]
        }
    }
]

In addition, third-party extensions have their own variables:

variable project + example of import string

GUARDIAN_GET_INIT_ANONYMOUS_USER

django-guardian/django-guardian

GUARDIAN_GET_INIT_ANONYMOUS_USER = 'app.models.get_anonymous_user_instance'

ACCOUNT_FORMS pennersr/django-allauth

ACCOUNT_FORMS = ({
    'login': 'myapp.app.user.forms.account.LoginForm',
}]

DEBUG_TOOLBAR_PANELS

jazzband/django-debug-toolbar (hey cool, django jazzband!)

DEBUG_TOOLBAR_PANELS = (
   'debug_toolbar.panels.versions.VersionsPanel',
   # .. and so on
)

You can even plugin dmclain/django-debug-toolbar-line-profiler:

if 'debug_toolbar_line_profiler' in INSTALLED_APPS:
    DEBUG_TOOLBAR_PANELS += (
        'debug_toolbar_line_profiler.panel.ProfilingPanel',
    )

Machinery behind string imports

Django makes use of two general functions. First is Django's django.utils.module_loading.import_string(). See the source in django/utils/module_loading.py:

def import_string(dotted_path):
    """
    Import a dotted module path and return the attribute/class designated by the
    last name in the path. Raise ImportError if the import failed.
    """
    try:
        module_path, class_name = dotted_path.rsplit('.', 1)
    except ValueError:
        msg = "%s doesn't look like a module path" % dotted_path
        six.reraise(ImportError, ImportError(msg), sys.exc_info()[2])

    module = import_module(module_path)

    try:
        return getattr(module, class_name)
    except AttributeError:
        msg = 'Module "%s" does not define a "%s" attribute/class' % (
            module_path, class_name)
        six.reraise(ImportError, ImportError(msg), sys.exc_info()[2])

In the end, it's a friendly wrapper around Python standard library's import_module() that handles errors better and allows accessing variables, functions, and classes in the module (file) loaded.

For a deeper dive, take a look at Lib/importlib/init.py and the rest of Lib/importlib/.

More fun browsing source code

For that matter, why stop there? There source of the official Python implementation is available to read at python/cpython. Fun times!

Use tags to browse specific releases, such as v3.6.3.

Branches for different release streams of Python are available:

Plain old python/cpython is the next coming release (3.7 as of 2017-11-24)
Branch 2.7 is where Python 2 is being maintained, tentative end-of-life 2020-01-01 (python docs also mentions this thread on the python-dev list.
And other branches like:
- 3.4 (end-of-life 2019-03-16)
- 3.5 (end-of-life 2020-09-13)
- 3.6 (end-of-life 2021-12-23)

Outside of Django

Older examples

My earliest exposure to superb usage of string import was from my favorite Python programmers.

First, Armin Ronacher's usage of it in Flask before they switched from plain-old unittest to pytest (which is fine, because pytest is awesome). It's viewable in _flask/testsuite/__init__.py of Flask 0.10. This would move through flask's test modules and collect the available test suites.

This next one took some digging to find:

In the early days of pypa/warehouse before it switched from pallets/werkzeug to Pylons/pyramid, there was a great central Warehouse object by Donald Stufftthat would scour and load up models. Remnants of it in my fork at warehouse/application.py.

A lot of these were phased out one way or another by using libraries that encouraged more conventionality. So those days of clever python sorcery, while fondly remembered, are more and more often getting usurped by libraries like pytest over unittest, and pyramids over plain-old Werkzeug.

More current examples

In modern flask: flask configurations, e.g.

app.config.from_object('yourapplication.default_settings')

tensorflow/tensorflow's uses delayed imports "to avoid pulling in large dependnecies ... and allows [them] only to be loaded when they are used". Here is TensorFlow's LazyLoader class:

class LazyLoader(types.ModuleType):
    """Lazily import a module, mainly to avoid pulling in large dependencies.
    `contrib`, and `ffmpeg` are examples of modules that are large and not always
    needed, and this allows them to only be loaded when they are used.
    """

    # The lint error here is incorrect.
    def __init__(self, local_name, parent_module_globals, name):  # pylint: disable=super-on-old-class
        self._local_name = local_name
        self._parent_module_globals = parent_module_globals

        super(LazyLoader, self).__init__(name)

    def _load(self):
        # Import the target module and insert it into the parent's namespace
        module = importlib.import_module(self.__name__)
        self._parent_module_globals[self._local_name] = module

        # Update this object's dict so that if someone keeps a reference to the
        #   LazyLoader, lookups are efficient (__getattr__ is only called on lookups
        #   that fail).
        self.__dict__.update(module.__dict__)

        return module

    def __getattr__(self, item):
        module = self._load()
        return getattr(module, item)

    def __dir__(self):
        module = self._load()
        return dir(module)

And the implementation in the main tensorflow module:

from tensorflow.python.util.lazy_loader import LazyLoader
contrib = LazyLoader('contrib', globals(), 'tensorflow.contrib')
del LazyLoader

That is a clever way to make a friendly API that balances features while staying performant.

Sphinx has string-level module resolution peppered everywhere. For instance, when resolving a module or function with sphinx autodoc, there's a need to resolve the Noodle in .. autoclass:: Noodle.
Another prime example in sphinx is the extensions variable in your conf.py:
```
extensions = [
    'sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx',
    'sphinx.ext.todo', 'sphinx.ext.viewcode', 'alagitpull'
]
```
These strings end up being resolved in load_extension().

Putting it into practice

Finally, you can use import strings with your own libraries as a way to make your code more reuseable. For instance, django 4.0 has a slugify function. For each django website, I have special cases where the default behavior is unsatisfactory. Often, the rules on how you'd handle slugification are dependent on the niche of the website.

For devel.tech, the default behavior for slugify-ing "C++" is to remove the plus signs. So it shows up as "c", which collides with the "C" programming language. "C#" is also trimmed down to "c". The django model field's will append numbers behind them, "c-2", "c-3" when autogenerating them.

What if we could make it so Django could slugify "C++" is "cpp", and "C#" is "c-sharp".

Term	`django.utils.text.slugify`	Better
C	c (correct)	N/A
C++	c	cpp
C#	c	c-sharp

There are more generic cases, such as \$ being blank with Django's stock django.utils.text.slugify. This could depend on the region of the website, since many nations have their own dollar (e.g. USD, AUD, CAD.) US\$ to USD, AU\$ to AUD, and so on?

Not just that, but when slugifying URL's, we are space sensitive and may prefer abbreviations/short names. For instance, New York City being nyc instead of new-york-city. What would a person on a smartphone type into Google?

Term	`django.utils.text.slugify`	What you (may) want
New York City	new-york-city	nyc
Y Combinator	y-combinator	yc
Portland	portland	pdx
Texas	texas	tx
\$	'' (empty)	usd, aud, etc?
US\$	us	usd
A\$	a	aud
bitcoin	bitcoin	btc
United States	united-states	usa
League of Legends	league-of-legends	league
Apple® iPod Touch	apple-ipod-touch	ipod-touch
GNU/Linux	gnulinux	GNU/Linux[1]

So there's two problems: Almost universally, the default slugify utilities in Django can lose valuable context information. Secondly, there's a need to handle custom cases depending on the needs of the website. One-size-fits-all solutions are possible to attempt, but an Australian website doesn't want to print \$ as USD without asking. A gaming website may want to slugify League of Legends as lol, which is ambiguous with Laugh Out Loud, and better summated as league.

So we know that this isn't unique to just me, it would apply to many Django developers. Yay, an oppurtunity to make an open source project!

So let's make the system that handles slugification into a list of filters. Remember context_processors? We can use import strings as a way to "plug-in" callback functions to handle slugification cases. In our settings:

SLUGIFY_PROCESSORS = [
     'myproject.myapp.slugify.slugify_programming_languages',
     'myproject.myapp.slugify.slugify_geo',
]

Here's an example of what slugify_programming_languages in myproject/myapp/slugify.py:

def slugify_programming_languages(value):
    value = value.replace('c++', 'cpp')
    return value

def slugify_geo(value):
    value = value.replace('United States', 'us')
    return value

Let's sweep in the SLUGIFY_PROCESSORS with a customized slugify() function that falls back on Django's (4.0+) default behavior:

from django.conf import settings
from django.utils.module_loading import import_string
from django.utils.text import slugify as django_slugify

def slugify(value, allow_unicode=False):
    if hasattr(settings, 'SLUGIFY_PROCESSORS'):
        for slugify_fn_str in settings.SLUGIFY_PROCESSORS:
            slugify_fn_ = import_string(slugify_fn_str)
            value = slugify_fn_(value)

    return django_slugify(value, allow_unicode)

This could be used as a custom slug function for django-extension's or django-autoslug. We can then also make it available as a template tag, too:

from django import template
from django.template.defaultfilters import stringfilter

from ..text import slugify as _slugify

register = template.Library()


@register.filter(is_safe=True)
@stringfilter
def slugify(value):
    return _slugify(value)

To demonstrate the above code, I forked off part of devel.tech's slugication code into tony/django-slugify-processor (pypi). The README has instructions on how to configure and implement it.

[1] Just an easter egg to see if anyone's reading :)

Wrapping up

The delayed resolution of imports via strings plays an instrumental role in the Django framework, but also a host of other python applications. The machinery behind it is part of Python's standard library.

There is a value equation to strike when using import strings for your project. If you value reuseability, customization, and have a recurring pattern that's package-worthy and you don't want to pull in the whole kitchen sink by default, using import strings are the solution in the end, one way or another.

Using Django as an obvious example of its success in the field, import strings also allow codebases to scale big while avoiding race conditions caused by circular dependencies.

Finally, we closed with a homemade example of using import strings. You can check out the source code, package, and docs.

Changes

April 28th, 2022:
- Update django links from 4.0 to 4.0
- Update python.org links from 2 to 3