I've been on-boarding to a new project recently and have been tasked with introducing or integrating a few new tools into the development process. In doing so I've learned a few things worth remembering.
Before making any strongly worded suggestions I took the opportunity to evaluate the landscape of Python analysis tools with fresh eyes. Below are my notes on each project and the decision I came to regarding its use.
Amongst them most cursory of inspection tooling available, Pyflakes is a good candidate to integrate directly into your editor of choice. It runs very quickly and isn't overly noisy in the suggestions it makes. The most common use-case that I can see Pyflakes catching is unused imports and shadowing previously used names for variables, classes and methods.
One of the touted benefits of Pyflakes is focus on zero-configuration. This is mostly a real benefit but makes the tool rigid in how it may be used. For instance, there is no way to selectively ignore issues that it flags. I ended up monkey patching this particular feature in, which I describe below.
Pylint is, in my opinion, an excellent tool to standardize a code-base. That being said, there's little a tool can do to solve process problems. It is entirely up to a team to decide how they follow the kinds of suggestions made by Pylint which is probably why it is so configurable.
Using this totally not-great but not terrible piece of code as an example:
addrBook = {
"Tom": { "street": "123 N. Main St.", "state": "AL", "phone": "888-999-0000"},
"Richard": {"street": "999 W. East Ave.", "state": "NY", "phone": "800-123-4567"},
"Harry": {"street": "456 South St.", "state": "AK", "phone": "212-212-2121"}
}
for p in addrBook.keys():
print(p + "'s phone number is: " + addrBook[p]['phone'])
Pylint reports the following:
C: 2, 0: No space allowed after bracket
"Tom": { "street": "123 Main St.", "state": "AL", "phone": "888-999-0000"},
^ (bad-whitespace)
C: 8, 0: Unnecessary parens after 'print' keyword (superfluous-parens)
C: 1, 0: Missing module docstring (missing-docstring)
C: 1, 0: Invalid constant name "addrBook" (invalid-name)
C: 7, 9: Consider iterating the dictionary directly instead of calling .keys() (consider-iterating-dictionary)
Which, for me, highlights several issues of convention (the leading C
for
each issue) and style that I happen to agree with, even if they don't impact
the fact that the code runs as is. Pylint defines invalid-name
according to a
host of regular expression rules which may be configured to suit your personal
style. I happen to like underscore-case, which Pylint defaults to, but
camelCase is an easy change to configure.
I regularly disable warnings of missing-docstring
because I think Pylint is a
bit too eager to have me document each and every function, method and class
that I write. The issue of superfluous-parens
is interesting not because I'm
likely to remove the parentheses in this case, but because it calls out the
fact that this was run with Python 2, and it's a simple fix to add from __future__ import print_function
and take a step towards future-proofing my
code.
I rather like the fact that Pylint calls out the fact that I am not doing the
most idiomatic thing by invoking the .keys()
method on the dictionary. If I
bothered to write idiomatic code, as the linter suggests I might instead end up
with something like this:
from __future__ import print_function
address_book = {
"Tom": {
"street": "123 Main St.",
"state": "AL",
"phone": "888-999-0000"
},
"Richard": {
"street": "999 W. East Ave.",
"state": "NY",
"phone": "800-123-4567"
},
"Harry": {
"street": "456 South St.",
"state": "AK",
"phone": "212-212-2121"
}
}
for person in address_book:
print("{}'s phone number is: {phone}".format(person,
**address_book[person]))
One thing that tripped me up briefly was Pylint's hierarchical search for configuration files. I was attempting to test the default configuration and having a hard time tracking down where a few lingering settings were coming from. It turns out Pylint looks in the following order for config files:
pylintrc
in the current working directory.pylintrc
in the current working directoryPYLINTRC
/root
:
.pylintrc
in your home directory.config/pylintrc
in your home directory/etc/pylintrc
Where I had an old config file in my home directory under
.config
. Ultimately, this lets you tailor Pylint at the module level within a
project and is, I think, a good thing.
Radon is a tool to measure code complexity. While I think the results are interesting, I have some doubt about how useful they will prove. For anyone familiar with a code-base I think Radon will only serve to highlight those parts which are known to be problematic, or hairy. What I ended up suggesting was making Radon available and agreeing to a system that says: if you end up working in a poorly graded part of the code it is your responsibility to clean it up. Only time will tell if Radon proves to be a useful addition to the set of tools available.
Pychecker often makes the list of
available tools if you search for "python linter" or "python static analysis"
but for me, it was hardly in the running. I couldn't find an easy way to
install (meaning with pip
) and I can hardly recommend a tool whose only means
of installation are basically "go download this tarball from SourceFourge".
Prospector is less of a tool and more of a meta-tool, it strings together three separate tools: Pylint, pep8, and McCabe complexity; under one interface. It purports to provide better defaults and automatically detect the use of libraries that traditionally trip up linters (such as Django).
I decided against Prospector because of the particular workflow I was working towards, one where a version control hook would flag the most egregious errors and leave style critiques available at the developer's discretion. I didn't see much benefit in entangling the tools together when they wouldn't necessarily be used together.
One nagging issue with Pyflakes is the inability to white-list "issues" or flag specific lines as being ignored. Thankfully, Python is dynamic enough that we might add this feature ourselves. Chase Seibert wrote up a solution to this very issue in 2013. However, Pyflakes changed its internals at some point in the intervening years and attempting the above results in the following error:
Traceback (most recent call last):
File "/tmp/flake.py", line 17, in <module>
pyflakes.main()
File "/usr/local/lib/python2.7/site-packages/pyflakes/api.py", line 172, in main
warnings = checkRecursive(args, reporter)
File "/usr/local/lib/python2.7/site-packages/pyflakes/api.py", line 129, in checkRecursive
warnings += checkPath(sourcePath, reporter)
File "/usr/local/lib/python2.7/site-packages/pyflakes/api.py", line 96, in checkPath
return check(codestr, filename, reporter)
File "/usr/local/lib/python2.7/site-packages/pyflakes/api.py", line 57, in check
w = checker.Checker(tree, filename)
File "/usr/local/lib/python2.7/site-packages/pyflakes/checker.py", line 299, in __init__
self.runDeferred(self._deferredAssignments)
File "/usr/local/lib/python2.7/site-packages/pyflakes/checker.py", line 332, in runDeferred
handler()
File "/usr/local/lib/python2.7/site-packages/pyflakes/checker.py", line 833, in checkUnusedAssignments
self.report(messages.UnusedVariable, binding.source, name)
File "/tmp/flake.py", line 8, in report_with_bypass
text_lineno = args[0] - 1
TypeError: unsupported operand type(s) for -: 'Name' and 'int'
The necessary correction to enable a # no_lint
annotation is as follows:
#!/usr/bin/env python
from pyflakes.scripts import pyflakes
from pyflakes.checker import Checker
def report_with_bypass(self, messageClass, *args, **kwargs):
text, _ = args
with open(self.filename, 'r') as code:
# mind the zero-based index of the read lines
if code.readlines()[text.lineno-1].find('no_lint') >= 0:
return
self.messages.append(messageClass(self.filename, *args, **kwargs))
Checker.report = report_with_bypass
pyflakes.main()
Ultimately, I think the tools available to introspect Python code are a little thin compared to those available in less dynamic languages. It seems the best way to catch issues early is through a combination of both testing and a tool like Pyflakes. I'll be sure to amend my opinions as I use the above tools more regularly, for now, it's better than nothing.