Struggling needlessly with Django's Object Relational Mapper, I find myself agreeing more and more with Ted Neward who claimed ORMs to be the vietnam of computer science.
After reading How not to structure your database-backed web applications today I decided to go ahead and post this bit of a rant. It has been lingering in a drafts folder for ages while I delay writing another post on data modeling directly in PostgreSQL.
Class Foo(db.model):
...
Class Bar(db.model):
foo_id = ForeignKey(Foo)
condition = BooleanField(...)
Class Qux(db.model):
foo_id = ForeignKey(Foo)
For the given (abbreviated) set of model class definitions, the following table structure is defined:
foo |
---|
id |
A purely referential table in this query.
bar | |
---|---|
id | |
foo-id | FOREIGN KEY REFERENCES foo(id) |
condition | BOOLEAN |
A table we want to filter on.
qux | |
---|---|
id | |
foo-id | FOREIGN KEY REFERENCES foo(id) |
The table with a value we actual want.
Select an attribute of table qux based on a filter condition of table bar, correlating them via a shared attribute, the ID of table foo.
select * from qux
join bar using(foo_id)
where bar.condition is null;
It took me a while to work out the necessary Django-isms to accomplish something close:
Qux.objects.filter(foo__bar__condition=None)
The annoying bit for me was not just stumbling over the attempted abstraction that the ORM requires, but finding that the results weren't even as good as what I had started with. Inspecting the query output of the about ORM call reveals the following:
select * from qux
inner join foo on (qux.foo_id=foo.id)
left outer join bar on (foo.id=bar.foo_id)
where bar.condition is null;
The ORM isn't doing what seems to me to be the obvious thing and omitting
references to the unused table ("foo") and instead it does another JOIN
operation entirely to "resolve" the reference of foo_id
, which it feels like
it could identify early as they are both foreign keys to the same table.
You might be tempted to try chaining filters using an in query, but the results were worse in my case, doing what ended up looking like 2 sequential scans over the separate tables (unsurprisingly).
Qux.objects.filter(foo_id__in=(Bar.objects.filter(condition=None)))
select * from qux
where (qux.foo_id in
(select bar.id from bar where bar.condition is null))
While I've been coming around to some of the advantages that frameworks like Django provide in the case of "large" or "real" projects, I continue to chafe at the awkwardness of database interactions.
Neward states:
early successes yield a commitment to use O/R-M in places where success becomes more elusive, and over time, isn’t a success at all due to the overhead of time and energy required to support it through all possible use-cases.
Which I think captures much of my arguments against the continued use of ORMs in projects with well-established data models. I think the ORM can let you iterate quickly with different models, and in Django's case, the migrations system can save time translating those changes. But I've looked before at managing migrations, and found then that doing these sorts of things isn't insurmountable by hand. So to am I coming to consider ORMs for data modeling, by shoe-horning relational data into the semantics of Python, I end up losing much of the expressivity to declaratively state what I want, which is a real gem of using a RDMS in the first place.