I had an interesting bug to work through this week, a deadlock between multiple concurrent threads writing into a Postgres database. Here I outline the gist of the problem and a solution.
Django provides a context manager called transaction.atomic
that
allows for block-scoping a transaction as shown below:
with transaction.atomic():
__exit__
BEGIN;
COMMIT;
… intermediate states between the steps are not visible to other concurrent transactions
This means only that we won't see half-finished work. It does not imply any guards in place against concurrent writes and locks are visible across transactions.
To guard concurrent writes, we use the FOR UPDATE
clause on a SELECT
statement to induce row-level locking before any updates.
However this has the potential to deadlock in the case of concurrent writes; as demonstrated below.
The problem is the fact that row-level locks are acquired incrementally in the order of the query results, leaving open the opportunity for two or more transactions to interfere with each other.
def _update(): with transaction.atomic(): ids = (Widget .objects .select_for_update() .filter(widget_id_id__in=[1, 2, 3])) for id in ids: # update widget model # save modeldeferToThread(_update)
In this scenario, the database returns widget model rows in the order 1, 3, 2.
Lock row 1
1 | foo |
2 | foo |
3 | qux |
4 | bar |
5 | qux |
Lock row 3
1 | foo |
2 | foo |
3 | qux |
4 | bar |
5 | qux |
Row 2 is locked, await thread 2
1 | foo |
2 | foo |
3 | qux |
4 | bar |
5 | qux |
def _update(): with transaction.atomic(): ids = (Widget .objects .select_for_update() .filter(widget_id_id__in=[2, 3, 5])) for id in ids: # update widget model # save modeldeferToThread(_update)
In this scenario, the database returns widget model rows in the order 2, 3, 5.
Lock row 2
1 | foo |
2 | foo |
3 | qux |
4 | bar |
5 | qux |
Row 3 is locked, await thread 1
1 | foo |
2 | foo |
3 | qux |
4 | bar |
5 | qux |
Deadlock
1 | foo |
2 | foo |
3 | qux |
4 | bar |
5 | qux |
The solution I've opted for here is to just introduce an ORDER BY
clause on the SELECT
statement, to guarantee that rows are locked in a consistent order even between transactions.
In the above example, if the order were enforced as 1, 2, 3 and 2, 3, 5, thread 1 might acquire a lock, then thread 2, at which point thread 1 would correctly await the completion of thread 2, which is (in this example) never blocked.
This is actually addressed in the documentation:
The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order. In the example above, if both transactions had updated the rows in the same order, no deadlock would have occurred.
But, of course, you won't know to look for it until you understand the problem — which can sometimes feel like a chicken and egg problem!