django-treebeard vs django-mptt
Handling hierarchical / tree-like DB schemas
My background
I forgot whether I used first django-treebeard or django-mptt first. But it was years ago. In the early stages, I didn't know much about the underlying mechanics of how it worked. Ergo, this article will cover both plugins, the SQL techniques they employ (both generally, and in-depth), public API's, internal python code, communities, and my own experience using them in production.
As for hierarchical data and SQL in general:
I mostly rely on ORM's. I do study plain-old SQL in my spare time. My favorite book for it is Joe Celko's SQL for Smarties.
Eventually, I wanted to develop a faceted search system for my ideas without having to pull in something heavy duty (like ElasticSearch). This would require a system that could "drill-down" deeper into categories.
I was very fortunate that Celko seemed to be one of the few in the forefront of handling the domain of hierarchical data in SQL. Apparently, he holds some recognition with pushing the concept of Nested set model forward. He also wrote another book, Trees and Hierarchies in SQL for Smarties. It's probably the best book on nested data in SQL.
So for further reading on the background mechanics of tree data in SQL, check those out. In future updates I hope to also dig into new stuff like matthiask/django-cte-forest.
Hierchical data in SQL
django-mptt
community / open source
disclaimer on numbers
I don't advise using contributor count, stars, releases, and commits as a sole factor. They're helpful for describing how well a plugin may be entrenched, but they don't factor in whether a plugin could have been outside GH for years, or the quality/depth of commits.
mptt, as of 2017-12-07:
- 93 contributors
- ~1600 stars
- 28 releases
- 994 commits
Methodology
Modified Preorder Traversal Tree is supposed to be a combination of Adjacency List, Nested Sets, and Materialized Path.
I don't know whether it's "more" complicated to balance, since balancing anything in SQL is already tedious, especially if nested-set stuff is involved.
django-treebeard
community / open source
Treebeard, as of 2017-12-07:
- 30 contributors
- ~400
- 18 releases
- 654 commits
While studying Django source code at-scale, I seen divio/django-cms move from django-mptt to django-treebeard. In addition, wagtail/wagtail used treebeard.
So, two industrial-sized django CMS systems use treebeard.
Comparison
This is an example, of why stars and commits alone, even using a "superior" technology in the innards, may not make a library better.
MPTT is supposed to employ all three of treebeard's strategies at once, being able to cope with fast read under robust situations.
The problem is, if the problem being solved doesn't require the complication, the engineering overheard incurred isn't worth it.
The internals of django-mptt are heavy with metaprogramming. On top of using an already complicated SQL technique, trying to read into mptt feels impenetrable.
django-mptt was also late for django 2.0's release. Nobody was around. The GitHub organization is a username.
According to stars and contributors, treebeard is feels like the runt of the litter.
More reading
First, reading the source code of treebeard and its documentation. In particular:
Adjacency lists
Overview: https://django-treebeard.readthedocs.io/en/latest/al_tree.html
Materialized Paths
Overview: https://django-treebeard.readthedocs.io/en/latest/mp_tree.html
Stuff to look at: how the materialized path uses
LIKE
, but with an index.Nested Sets
Overview: https://django-treebeard.readthedocs.io/en/latest/ns_tree.html
Also, two of Django's major CMS systems implement treebeard, check that out:
- Django CMS by divio
- Wagtail
I don't really recommend django-mptt's internals. It's very meta class heavy. I'll leave it here for posterity, but it's very complicated: https://github.com/django-mptt/django-mptt/tree/master/mptt
Earlier on, two books by Joe Celko. I prefer to stick to his since they cover stuff well. For general SQL, SQL for Smarties and Trees and Hierarchies in SQL for Smarties for stuff like adjacency lists, materialized path, and nested sets.
Another author that's authoritative on hierchical design patterns is Vadim Tropashko, author of SQL Design Patterns.