How Django uses deferred imports to scale
Django's smart use of Python's importlib
This post aims to dispel ambiguities introduced by Django for those learning it. These things initially perplexed me. I found it hard to grasp and articulate, and for my first years in Python, they were akin to magic.
The idea of using strings in Django settings and having them resolving python code seemed very indirect. How did that work? Was it a feature of python itself? Or was it some Rails-like meta programming sorcery? Is the notion from The Zen of Python of "Explicit is better than implicit" be challenged?
The concept of resolving strings into evaluated source code reaches
beyond Django. For instance, enabling extensions in Sphinx. Even
command line interfaces in standard Python, easily-accessible as
$ python -m
in PEP 338 or
unittest
's CLI interface.
This article tries to document where these conditions arise, so we how to distinguish where and when we see them, especially in django. Finally, we'll look into how it works underneath the hood in terms of the broader python language, and demonstrate something useful with it and relase it into open source.
"Import strings" have a lot of useful applications. I'd call them a necessity in a framework like Django, or else there'd be race conditions and circular dependencies. Django's loading of settings, applications and models is actually rather intricate, and in my opinion, well-executed.
We've all seen INSTALLED_APPS
and through this, we can declare python
string literals that later load applications. To clarify specific
examples of Django's extensive usage of import strings, let's try to
document examples.
Django's string imports
There are also other settings in Django that load modules, classes, and functions via strings:
DJANGO_SETTINGS_MODULE
The first and most famous import string in Django is
DJANGO_SETTINGS_MODULE
. This is imported via
importlib.import_module()
in
django/conf/__init__.py
.
The string you use for it loads a python module, which equates to file.
If you have the current directory in your [site-packages/](https://docs.python.org/3/library/site.html], and your settings are at
project/settings/local.py, then your DJANGO_SETTINGS_MODULE
should
be set to DJANGO_SETTINGS_MODULE=project.settings.local
.
Settings variables
There are accessiable via attributes of the django.conf.settings
during runtime.
variable / example of import string
INSTALLED_APPS
.. code-block:: python
INSTALLED_APPS = ['path.to.myapp']
ROOT_URLCONF
ROOT_URLCONF = 'myapp.urls'
WSGI_APPLICATION
: .. code-block:: python
WSGI_APPLICATION = 'develtech.wsgi.application'
STATICFILES_FINDERS
STATICFILES_FINDERS = ('django.contrib.staticfiles.finders.FileSystemFinder',)
AUTH_USER_MODEL
This isn't a "pure" import string. This works via
django:django.apps.apps.get_model
AUTH_USER_MODEL = 'user.User'
MIDDLEWARE
MIDDLEWARE = (
'django.contrib.sessions.middleware.SessionMiddleware',
)
TEMPLATES
Inside of BACKEND
and OPTIONS['context_processors']
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'OPTIONS': {
'context_processors': [
'django.template.context_processors.request'
],
},
},
]
``AUTHENTICATION_BACKENDS``
AUTHENTICATION_BACKENDS = [
'guardian.backends.ObjectPermissionBackend',
'allauth.account.auth_backends.AuthenticationBackend',
]
STATICFILES_STORAGE
STATICFILES_STORAGE = 'django.core.files.storage.FileSystemStorage'
See source code of FileSystemStorage
EMAIL_BACKEND
:
EMAIL_BACKEND ='django.core.mail.backends.smtp.EmailBackend'
URL Routes
You've probably seen that ROOT_URLCONF
is itself an import string, but
the code inside urls.py files also uses them.
First, let's do a real object example:
from django.urls import include, re_path
from django.contrib.auth.views import logout
urlpatterns = [
re_path(r'^logout/', logout, name='logout', kwargs={'next_page': '/'}),
]
Django's route system also allow use of import strings via include()
,
which allows import strings to url python files (with a urlpatterns
inside them).
from django.urls import include, re_path
urlpatterns = [
re_path(r'^accounts/', include('allauth.urls')),
]
Where allauth.urls
is
allauth/urls.py
And also when you're declaring error pages in urls.py, such as
django:django.conf.urls.handler404
:
handler404 = 'based.django.views.errors.page_not_found'
Models
The next place you'll see string references to objects is in relational
models, such as
django.db.models.ForeignKey
. Here is an excerpt
taken directly from Django's documentation:
from django.db import models
class Car(models.Model):
manufacturer = models.ForeignKey(
'Manufacturer',
on_delete=models.CASCADE,
)
This establishing a relationship with a class Manufacturer
.
Template engine
Template's are probably the most intricate usage of import strings.
https://docs.djangoproject.com/en/4.0/ref/templates/api/#loader-types
Template tags
Here is an example taken from the django documentation (concatenated so you can see both import string cases):
Engine(
libraries={
'myapp_tags': 'path.to.myapp.tags',
'admin.urls': 'django.contrib.admin.templatetags.admin_urls',
},
builtins=['myapp.builtins'],
)
This is what it looks like when Django configures a template engine
internally, e.g. path.to.myapp.tags
, which are available automatically
when you use that template engine. But more often, engines are
configured declaratively via a settings file:
TEMPLATES = [{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'OPTIONS': {
'libraries':{
'myapp_tags': 'path.to.myapp.tags',
'admin.urls': 'django.contrib.admin.templatetags.admin_urls',
},
'builtins': ['myapp.builtins'],
},
}]
Despite the oppurtunity to have custom tags builtin like django's
default tags (added in
django/template/engine.py),
it's more common to for Django developers to opt-in to loading template
tags via {% load libraryname %}
tag that picks up files inside the
templatetags/
directory of applications, or the key used in
OPTIONS['libraries']
.
Dig deeper into Django's template engine
To dig deeper into Django's templates, I recommend django/templates/base.py
Check out {% load %}
template tag
on GitHub. This adds the library to a registry, eventually down the line
it's loaded via
import_library.
Context processors
Context processors allow information to be added to the request object. For the node.js programmers out there, these are sort of like passing contextual information passed through Express middleware.
In Django settings:
TEMPLATES = [{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'OPTIONS': {
'context_processors': [
'django.template.context_processors.request'
],
},
}]
In Django plugins
One of the reasons import strings are used is it also makes third party extensions easier to implement.
Wherever an import string is used, Django settings can also have third
party-plugins that fit the interface/class. For a first one,
EMAIL_BACKEND
supports third party extensions, like
anymail/django-anymail (which I recommend!)
variable project + example of import string EMAIL_BACKEND
EMAIL_BACKEND = 'anymail.backends.mandrill.EmailBackend'
STATICFILES_STORAGE
STATICFILES_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
TEMPLATES
nigma/django-easy-pjax (example taken from README):
TEMPLATES=[
{
"BACKEND": "django.template.backends.django.DjangoTemplates",
"DIRS": [...],
"APP_DIRS": True,
"OPTIONS": {
"builtins": [
"easy_pjax.templatetags.pjax_tags"
],
"context_processors": [
"django.template.context_processors.request",
"django.template.context_processors.static",
# ...
]
}
}
]
In addition, third-party extensions have their own variables:
variable project + example of import string
GUARDIAN_GET_INIT_ANONYMOUS_USER
django-guardian/django-guardian
GUARDIAN_GET_INIT_ANONYMOUS_USER = 'app.models.get_anonymous_user_instance'
ACCOUNT_FORMS
pennersr/django-allauth
ACCOUNT_FORMS = ({
'login': 'myapp.app.user.forms.account.LoginForm',
}]
DEBUG_TOOLBAR_PANELS
jazzband/django-debug-toolbar (hey cool, django jazzband!)
DEBUG_TOOLBAR_PANELS = (
'debug_toolbar.panels.versions.VersionsPanel',
# .. and so on
)
You can even plugin dmclain/django-debug-toolbar-line-profiler:
if 'debug_toolbar_line_profiler' in INSTALLED_APPS:
DEBUG_TOOLBAR_PANELS += (
'debug_toolbar_line_profiler.panel.ProfilingPanel',
)
Machinery behind string imports
Django makes use of two general functions. First is Django's
django.utils.module_loading.import_string()
. See the source in
django/utils/module_loading.py:
def import_string(dotted_path):
"""
Import a dotted module path and return the attribute/class designated by the
last name in the path. Raise ImportError if the import failed.
"""
try:
module_path, class_name = dotted_path.rsplit('.', 1)
except ValueError:
msg = "%s doesn't look like a module path" % dotted_path
six.reraise(ImportError, ImportError(msg), sys.exc_info()[2])
module = import_module(module_path)
try:
return getattr(module, class_name)
except AttributeError:
msg = 'Module "%s" does not define a "%s" attribute/class' % (
module_path, class_name)
six.reraise(ImportError, ImportError(msg), sys.exc_info()[2])
In the end, it's a friendly wrapper around Python standard library's
import_module()
that handles errors better and allows
accessing variables, functions, and classes in the module (file) loaded.
For a deeper dive, take a look at Lib/importlib/init.py and the rest of Lib/importlib/.
More fun browsing source code
For that matter, why stop there? There source of the official Python implementation is available to read at python/cpython. Fun times!
Use tags to browse specific releases, such as v3.6.3.
Branches for different release streams of Python are available:
- Plain old python/cpython is the next coming release (3.7 as of 2017-11-24)
- Branch 2.7 is where Python 2 is being maintained, tentative end-of-life 2020-01-01 (python docs also mentions this thread on the python-dev list.
- And other branches like:
Outside of Django
Older examples
My earliest exposure to superb usage of string import was from my favorite Python programmers.
First, Armin Ronacher's usage of it in Flask before they switched from
plain-old unittest to pytest (which is fine, because pytest is
awesome). It's viewable in
_flask/testsuite/__init__.py
of Flask 0.10. This would move through flask's test modules and collect
the available test suites.
This next one took some digging to find:
In the early days of pypa/warehouse before it switched from
pallets/werkzeug to Pylons/pyramid, there was a great central
Warehouse
object by Donald Stufftthat would scour and
load up models. Remnants of it in my fork at
warehouse/application.py.
A lot of these were phased out one way or another by using libraries that encouraged more conventionality. So those days of clever python sorcery, while fondly remembered, are more and more often getting usurped by libraries like pytest over unittest, and pyramids over plain-old Werkzeug.
More current examples
In modern flask: flask configurations, e.g.
app.config.from_object('yourapplication.default_settings')
tensorflow/tensorflow's uses delayed imports "to avoid pulling in large dependnecies ... and allows [them] only to be loaded when they are used". Here is TensorFlow's
LazyLoader
class:class LazyLoader(types.ModuleType): """Lazily import a module, mainly to avoid pulling in large dependencies. `contrib`, and `ffmpeg` are examples of modules that are large and not always needed, and this allows them to only be loaded when they are used. """ # The lint error here is incorrect. def __init__(self, local_name, parent_module_globals, name): # pylint: disable=super-on-old-class self._local_name = local_name self._parent_module_globals = parent_module_globals super(LazyLoader, self).__init__(name) def _load(self): # Import the target module and insert it into the parent's namespace module = importlib.import_module(self.__name__) self._parent_module_globals[self._local_name] = module # Update this object's dict so that if someone keeps a reference to the # LazyLoader, lookups are efficient (__getattr__ is only called on lookups # that fail). self.__dict__.update(module.__dict__) return module def __getattr__(self, item): module = self._load() return getattr(module, item) def __dir__(self): module = self._load() return dir(module)
And the implementation in the main tensorflow module:
from tensorflow.python.util.lazy_loader import LazyLoader contrib = LazyLoader('contrib', globals(), 'tensorflow.contrib') del LazyLoader
That is a clever way to make a friendly API that balances features while staying performant.
Sphinx has string-level module resolution peppered everywhere. For instance, when resolving a module or function with sphinx autodoc, there's a need to resolve the
Noodle
in.. autoclass:: Noodle
.Another prime example in sphinx is the
extensions
variable in your conf.py:extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.viewcode', 'alagitpull' ]
These strings end up being resolved in
load_extension()
.
Putting it into practice
Finally, you can use import strings with your own libraries as a way to make your code more reuseable. For instance, django 4.0 has a slugify function. For each django website, I have special cases where the default behavior is unsatisfactory. Often, the rules on how you'd handle slugification are dependent on the niche of the website.
For devel.tech, the default behavior for slugify-ing "C++" is to remove the plus signs. So it shows up as "c", which collides with the "C" programming language. "C#" is also trimmed down to "c". The django model field's will append numbers behind them, "c-2", "c-3" when autogenerating them.
What if we could make it so Django could slugify "C++" is "cpp", and "C#" is "c-sharp".
Term | django.utils.text.slugify | Better |
---|---|---|
C | c (correct) | N/A |
C++ | c | cpp |
C# | c | c-sharp |
There are more generic cases, such as \$ being blank with Django's
stock django.utils.text.slugify
. This could depend on the region of
the website, since many nations have their own dollar (e.g. USD, AUD,
CAD.) US\$ to USD, AU\$ to AUD, and so on?
Not just that, but when slugifying URL's, we are space sensitive and may prefer abbreviations/short names. For instance, New York City being nyc instead of new-york-city. What would a person on a smartphone type into Google?
Term | django.utils.text.slugify | What you (may) want |
---|---|---|
New York City | new-york-city | nyc |
Y Combinator | y-combinator | yc |
Portland | portland | pdx |
Texas | texas | tx |
\$ | '' (empty) | usd, aud, etc? |
US\$ | us | usd |
A\$ | a | aud |
bitcoin | bitcoin | btc |
United States | united-states | usa |
League of Legends | league-of-legends | league |
AppleĀ® iPod Touch | apple-ipod-touch | ipod-touch |
GNU/Linux | gnulinux | GNU/Linux[1] |
So there's two problems: Almost universally, the default slugify utilities in Django can lose valuable context information. Secondly, there's a need to handle custom cases depending on the needs of the website. One-size-fits-all solutions are possible to attempt, but an Australian website doesn't want to print \$ as USD without asking. A gaming website may want to slugify League of Legends as lol, which is ambiguous with Laugh Out Loud, and better summated as league.
So we know that this isn't unique to just me, it would apply to many Django developers. Yay, an oppurtunity to make an open source project!
So let's make the system that handles slugification into a list of
filters. Remember context_processors
? We can use import strings as a
way to "plug-in" callback functions to handle slugification cases. In
our settings:
SLUGIFY_PROCESSORS = [
'myproject.myapp.slugify.slugify_programming_languages',
'myproject.myapp.slugify.slugify_geo',
]
Here's an example of what slugify_programming_languages
in
myproject/myapp/slugify.py
:
def slugify_programming_languages(value):
value = value.replace('c++', 'cpp')
return value
def slugify_geo(value):
value = value.replace('United States', 'us')
return value
Let's sweep in the SLUGIFY_PROCESSORS
with a customized slugify()
function that falls back on Django's (4.0+) default behavior:
from django.conf import settings
from django.utils.module_loading import import_string
from django.utils.text import slugify as django_slugify
def slugify(value, allow_unicode=False):
if hasattr(settings, 'SLUGIFY_PROCESSORS'):
for slugify_fn_str in settings.SLUGIFY_PROCESSORS:
slugify_fn_ = import_string(slugify_fn_str)
value = slugify_fn_(value)
return django_slugify(value, allow_unicode)
This could be used as a custom slug function for django-extension's or django-autoslug. We can then also make it available as a template tag, too:
from django import template
from django.template.defaultfilters import stringfilter
from ..text import slugify as _slugify
register = template.Library()
@register.filter(is_safe=True)
@stringfilter
def slugify(value):
return _slugify(value)
To demonstrate the above code, I forked off part of devel.tech's slugication code into tony/django-slugify-processor (pypi). The README has instructions on how to configure and implement it.
[1] Just an easter egg to see if anyone's reading :)
Wrapping up
The delayed resolution of imports via strings plays an instrumental role in the Django framework, but also a host of other python applications. The machinery behind it is part of Python's standard library.
There is a value equation to strike when using import strings for your project. If you value reuseability, customization, and have a recurring pattern that's package-worthy and you don't want to pull in the whole kitchen sink by default, using import strings are the solution in the end, one way or another.
Using Django as an obvious example of its success in the field, import strings also allow codebases to scale big while avoiding race conditions caused by circular dependencies.
Finally, we closed with a homemade example of using import strings. You can check out the source code, package, and docs.
Changes
- April 28th, 2022:
- Update django links from 4.0 to 4.0
- Update python.org links from 2 to 3