An Especially Annoying Python Bug

2019-01-12

Dealt with an annoying sorting bug in Python this week. Just another pitfall of a dynamically typed language I suppose.

The Problem

The problem surfaced when trying to combine the values of a dictionary into a single string. A simplified example of how this turned up looked something like this:

a_dictionary = {
    'thing_one': ...,
    'thing_two': ...,
    'thing_three': ...,
    ...
}

def make_a_string_of_it(**kwargs):
    return '-'.join(sorted(kwargs.values()))

The result of calling make_a_string_of_it with the exact same data would sometimes return a different string. More confusingly, the number of elements in the dictionary and the values themselves seemed to matter.

My co-workers and I are all well aware of how, under Python 2, dictionaries are not ordered. We understand that the values can come out in whatever order they ended up hashing into when the dictionary was created. We resolved to work around this in the first place by calling sorted on the values before creating the result string.

The Smallest Demonstrative Case

After spending a while interrogating different dictionaries and digging into CPython's dictobject.c implementation to convince myself that the .values() method was really doing nothing very interesting besides walking the internal table in a for-loop, we came up with a minimal example that didn't even require a dictionary.

>>> bar = ['foo', u'bar', ()]
>>> sorted(bar)
[(), u'bar', 'foo']

>>> foo = ['foo', (), u'bar']
>>> sorted(foo)
['foo', (), u'bar']

Two lists, of the same elements, when sorted, producing different results. This fit the bug we were seeing, the first would turn into a string ()-bar-foo while the second would become foo-()-bar.

Taken pairwise, both lists are actually "correct":

from itertools import tee

# https://docs.python.org/2/library/itertools.html#recipes
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    try:
        next(b)
    except StopIteration:
        pass
    return zip(a, b)

def compare_order(some_list):
    for left, right in [i for i in pairwise(sorted(some_list))]:
        print('{!r} < {!r}: {}'.format(left, right, (left < right)))
    
compare_order(['foo', u'bar', ()])
() < u'bar': True
u'bar' < 'foo': True

compare_order(['foo', (), u'bar'])
'foo' < (): True
() < u'bar': True

() is "less than" u'bar'
'foo' is "less than" ()
u'bar' is "less than" 'foo'

Best Guess at the Cause

Obviously, there is a lack of something like total_ordering across the different types here. I haven't entirely sussed it out, but I think the problem may lie in how Python's timsort guarantees the sorting to be stable, meaning list elements will retain their original order (sort-of, see link for more detail). Since the comparisons aren't totally unambiguous, it tries to retain the original ordering where it can and reaches a sort that suffices but is not unique.

A confounding issue in all of this was how dictionary ordering could sometimes mask the issue. If the keys are "large enough", they'll hash into different "buckets" of the hash table and have the potential to come out of the values method in a different order. Because of the opaque nature of the hashing, it isn't clear when "sort" is doing something asinine as compared to the dictionary treating the data differently.

The Future

Thankfully, this isn't an issue in Python 3. Attempting as much results in a TypeError due to unorderable types. While it is nice to know that this is fixed in later versions, I'm sure I'll be dealing with Python 2 for a while longer still. Until then, a solution for the above was to call str over the list of values before sorting.

I must say, I'm glad I've taken the time before to understand Python internals¹ a bit better. Coming at some of these things cold can be absolutely baffling.

The "Ten-Hour Codewalk" is really quite good