Formatting Python

2017-08-11

Projects like gofmt and rustfmt aim to simplify or even eliminate discussions of style during code review. Several tools purport to do the same for Python, but how well do they work, and how can you tell?

I've been advocating for standardizing a code base on PEP-8 recently and in making some progress the question arose "what do we do with existing code?". Auto-formatters are an interesting option, but presented several questions. Firstly, how can we verify the formatter does not change the semantics of the program? Secondly, how can we prevent creating a "firewall" across which diffs are effectively meaningless?

Verifying Correctness

It would seem there are a few options to guarantee programs continue to work as intended, perhaps the most obvious is through thorough unit testing, failing that though, there are bound to be at least a few options. Initially I considered comparing the generated bytecode on a per-file basis but quickly discarded the idea in favor of comparing the abstract syntax trees for each file instead. By focusing on the ASTs for each file rather than the bytecode the focus remains on the semantics of the program rather than the output of the compiler.

An Example

Using the below nonsense function as an example:

def foo ( bar = None ):
    if ( bar ):
        raise NotImplementedError ( "weird!" )

The AST which results looks like this:

Module(body=[FunctionDef(name='foo', args=arguments(args=[Name(id='bar', ctx=Param())], vararg=None, kwarg=None, defaults=[Name(id='None', ctx=Load())]), body=[If(test=Name(id='bar', ctx=Load()), body=[Raise(type=Call(func=Name(id='NotImplementedError', ctx=Load()), args=[Str(s='weird!')], keywords=[], starargs=None, kwargs=None), inst=None, tback=None)], orelse=[])], decorator_list=[])])

Which is unfortunately inscrutable. There are a number of recipes online for AST pretty-printers, which tidy things up some (at the expense of vertical space).

It turns out, introspecting and comparing the AST in Python is pretty easy. The first step was to ensure the formatter provided an interface beyond the command line, which was true for the particular library I was considering, yapf. The second step was to write a directory walker to collect the full paths in the source tree to Python files:

import fnmatch
import sys
import ast
import os

from yapf.yapflib.yapf_api import FormatCode


def collect_python_files(directory):
    for root, _, filenames in os.walk(directory):
        for filename in fnmatch.filter(filenames, "*.py"):
            # skip generated code
            if "migrations" in root:
                continue
            else:
                yield os.path.join(root, filename)

You may note the one additional bit of logic in the directory walking which is the conditional skipping of any generated code, in the above example, any Django migrations won't be auto-formatted and shouldn't be considered.

With a full filepath available, it comes to the actual verification of the ASTs before and after the transform. The yapf library exposes a top-level function FormatCode that takes a string of source code and returns a tuple of the formatted string and a boolean of whether changes were made. The verification here is then a check on the AST before the transform and after:

def comparator(filename):
    with open(filename) as f:
        source = f.read()
        original = ast.dump(ast.parse(source))
        formatted, _ = FormatCode(source)
        new = ast.dump(ast.parse(formatted))
        return (original == new, filename)

That's really all there is to it, the following is the necessary code to run the thing, giving it a single command line argument so that source code sub-trees or other repositories can be checked easily.

if __name__ == "__main__":
    source_dir = sys.argv[1]
    problematic_files = []

    print("Checking for AST compatibility")
    print("==============================")

    for filename in collect_python_files(source_dir):
        try:
            okay, _ = comparator(filename)
            if not okay:
                print(filename)
        except Exception as e:
            problematic_files.append((filename, e))

    print("Finished checking for compatibility")
    print("===================================")

    if problematic_files:
        print("Something went wrong in parsing or formatting the following:")
        for i, (filename, exception) in enumerate(problematic_files, start=1):
            print("{}. {}\n\t{}".format(i, filename, exception))

It turns out, in all the code I checked, yapf never once changed the AST of a program. This would seem obvious (clearly, you never want that to happen) but I was surprised it wasn't advertised more heavily in the README or docs. I haven't yet checked the other strong contender in Python formatters, autopep8, which appears to be less opinionated, but with the above AST check, I'm reasonably confident any conversion should be straight-forward.

Managing Diffs

There still exists the problem of how to manage diffs after such a conversion. Consider the following example:

def foo ( bar = None ):
    if ( bar ):
        raise NotImplementedError ( "weird!" )

running those three lines through yapf results in:

def foo(bar=None):
    if (bar):
        raise NotImplementedError("weird!")

which, though it looks sensible, creates havoc in the diff:

@@ -1,3 +1,3 @@
-def foo ( bar = None ):
-    if ( bar ):
-        raise NotImplementedError ( "weird!" )
+def foo(bar=None):
+    if (bar):
+        raise NotImplementedError("weird!")

the formatter has touched every line! If you scale this up in your mind to hundreds of thousands of lines, you see the potential for confusion in tracking bugs through a file's history. Running git blame will result in, at least a surface-level, indication that I am responsible for every line of code in the project. Halfway decent tools can ameliorate this by exposing a navigation through a files history, rather than only the last change made (Emacs' vc-annotate does this reasonably well). I still haven't found any real way around this, half the problem is a people problem and the other half is probably teaching people to use their tools better.

An End To Holy Wars?

While I am excited at the direction of things like yapf and the assorted *fmt tools, I can't help but think they won't entirely solve the problem without being "baked into" the language. Go gets pretty far with this, but even so manages something like 70% adoption, which means 30% is non-standard in style. I would like to think that having a tool to reformat things would mean that it is simply run on-commit, or on-save, but only time will tell. If nothing else, any style debates should be trivially resolvable with the solution being "run it through the tool", and if I never have a discussion of coding style again in my life I think that'd be just fine.