Projects like gofmt and rustfmt aim to simplify or even eliminate discussions of style during code review. Several tools purport to do the same for Python, but how well do they work, and how can you tell?
I've been advocating for standardizing a code base on PEP-8 recently and in making some progress the question arose "what do we do with existing code?". Auto-formatters are an interesting option, but presented several questions. Firstly, how can we verify the formatter does not change the semantics of the program? Secondly, how can we prevent creating a "firewall" across which diffs are effectively meaningless?
It would seem there are a few options to guarantee programs continue to work as intended, perhaps the most obvious is through thorough unit testing, failing that though, there are bound to be at least a few options. Initially I considered comparing the generated bytecode on a per-file basis but quickly discarded the idea in favor of comparing the abstract syntax trees for each file instead. By focusing on the ASTs for each file rather than the bytecode the focus remains on the semantics of the program rather than the output of the compiler.
Using the below nonsense function as an example:
def foo ( bar = None ):
if ( bar ):
raise NotImplementedError ( "weird!" )
The AST which results looks like this:
Module(body=[FunctionDef(name='foo', args=arguments(args=[Name(id='bar', ctx=Param())], vararg=None, kwarg=None, defaults=[Name(id='None', ctx=Load())]), body=[If(test=Name(id='bar', ctx=Load()), body=[Raise(type=Call(func=Name(id='NotImplementedError', ctx=Load()), args=[Str(s='weird!')], keywords=[], starargs=None, kwargs=None), inst=None, tback=None)], orelse=[])], decorator_list=[])])
Which is unfortunately inscrutable. There are a number of recipes online for AST pretty-printers, which tidy things up some (at the expense of vertical space).
It turns out, introspecting and comparing the AST in Python is pretty easy. The first step was to ensure the formatter provided an interface beyond the command line, which was true for the particular library I was considering, yapf. The second step was to write a directory walker to collect the full paths in the source tree to Python files:
import fnmatch
import sys
import ast
import os
from yapf.yapflib.yapf_api import FormatCode
def collect_python_files(directory):
for root, _, filenames in os.walk(directory):
for filename in fnmatch.filter(filenames, "*.py"):
# skip generated code
if "migrations" in root:
continue
else:
yield os.path.join(root, filename)
You may note the one additional bit of logic in the directory walking which is the conditional skipping of any generated code, in the above example, any Django migrations won't be auto-formatted and shouldn't be considered.
With a full filepath available, it comes to the actual verification of the ASTs
before and after the transform. The yapf
library exposes a top-level function
FormatCode
that takes a string of source code and returns a tuple of the
formatted string and a boolean of whether changes were made. The verification
here is then a check on the AST before the transform and after:
def comparator(filename):
with open(filename) as f:
source = f.read()
original = ast.dump(ast.parse(source))
formatted, _ = FormatCode(source)
new = ast.dump(ast.parse(formatted))
return (original == new, filename)
That's really all there is to it, the following is the necessary code to run the thing, giving it a single command line argument so that source code sub-trees or other repositories can be checked easily.
if __name__ == "__main__":
source_dir = sys.argv[1]
problematic_files = []
print("Checking for AST compatibility")
print("==============================")
for filename in collect_python_files(source_dir):
try:
okay, _ = comparator(filename)
if not okay:
print(filename)
except Exception as e:
problematic_files.append((filename, e))
print("Finished checking for compatibility")
print("===================================")
if problematic_files:
print("Something went wrong in parsing or formatting the following:")
for i, (filename, exception) in enumerate(problematic_files, start=1):
print("{}. {}\n\t{}".format(i, filename, exception))
It turns out, in all the code I checked, yapf
never once changed the AST of a
program. This would seem obvious (clearly, you never want that to happen) but I
was surprised it wasn't advertised more heavily in the README or docs. I
haven't yet checked the other strong contender in Python formatters,
autopep8, which appears to be less
opinionated, but with the above AST check, I'm reasonably confident any
conversion should be straight-forward.
There still exists the problem of how to manage diffs after such a conversion. Consider the following example:
def foo ( bar = None ):
if ( bar ):
raise NotImplementedError ( "weird!" )
running those three lines through yapf
results in:
def foo(bar=None):
if (bar):
raise NotImplementedError("weird!")
which, though it looks sensible, creates havoc in the diff:
@@ -1,3 +1,3 @@
-def foo ( bar = None ):
- if ( bar ):
- raise NotImplementedError ( "weird!" )
+def foo(bar=None):
+ if (bar):
+ raise NotImplementedError("weird!")
the formatter has touched every line! If you scale this up in your mind to
hundreds of thousands of lines, you see the potential for confusion in tracking
bugs through a file's history. Running git blame
will result in, at least a
surface-level, indication that I am responsible for every line of code in the
project. Halfway decent tools can ameliorate this by exposing a navigation
through a files history, rather than only the last change made (Emacs'
vc-annotate
does this reasonably well). I still haven't found any real way
around this, half the problem is a people problem and the other half is
probably teaching people to use their tools better.
While I am excited at the direction of things like yapf
and the assorted
*fmt
tools, I can't help but think they won't entirely solve the problem
without being "baked into" the language. Go gets pretty far with this, but even
so manages something like 70% adoption, which means 30% is non-standard in
style. I would like to think that having a tool to reformat things would mean
that it is simply run on-commit, or on-save, but only time will tell. If
nothing else, any style debates should be trivially resolvable with the
solution being "run it through the tool", and if I never have a discussion of
coding style again in my life I think that'd be just fine.