[nolan@nprescott.com] $>  cat blog archive feed

Wrangling Linting Tools

2016-10-22

I've been on-boarding to a new project recently and have been tasked with introducing or integrating a few new tools into the development process. In doing so I've learned a few things worth remembering.

A Quick Audit of Linting Tools

Before making any strongly worded suggestions I took the opportunity to evaluate the landscape of Python analysis tools with fresh eyes. Below are my notes on each project and the decision I came to regarding its use.

Pyflakes

Amongst them most cursory of inspection tooling available, Pyflakes is a good candidate to integrate directly into your editor of choice. It runs very quickly and isn't overly noisy in the suggestions it makes. The most common use-case that I can see Pyflakes catching is unused imports and shadowing previously used names for variables, classes and methods.

One of the touted benefits of Pyflakes is focus on zero-configuration. This is mostly a real benefit but makes the tool rigid in how it may be used. For instance, there is no way to selectively ignore issues that it flags. I ended up monkey patching this particular feature in, which I describe below.

Pylint

Pylint is, in my opinion, an excellent tool to standardize a code-base. That being said, there's little a tool can do to solve process problems. It is entirely up to a team to decide how they follow the kinds of suggestions made by Pylint which is probably why it is so configurable.

Using this totally not-great but not terrible piece of code as an example:

addrBook = {
    "Tom": { "street": "123 N. Main St.", "state": "AL", "phone": "888-999-0000"},
    "Richard": {"street": "999 W. East Ave.", "state": "NY", "phone": "800-123-4567"},
    "Harry": {"street": "456 South St.", "state": "AK", "phone": "212-212-2121"}
}

for p in addrBook.keys():
    print(p + "'s phone number is: " + addrBook[p]['phone'])

Pylint reports the following:

C:  2, 0: No space allowed after bracket
    "Tom": { "street": "123 Main St.", "state": "AL", "phone": "888-999-0000"},
           ^ (bad-whitespace)
C:  8, 0: Unnecessary parens after 'print' keyword (superfluous-parens)
C:  1, 0: Missing module docstring (missing-docstring)
C:  1, 0: Invalid constant name "addrBook" (invalid-name)
C:  7, 9: Consider iterating the dictionary directly instead of calling .keys() (consider-iterating-dictionary)

Which, for me, highlights several issues of convention (the leading C for each issue) and style that I happen to agree with, even if they don't impact the fact that the code runs as is. Pylint defines invalid-name according to a host of regular expression rules which may be configured to suit your personal style. I happen to like underscore-case, which Pylint defaults to, but camelCase is an easy change to configure.

I regularly disable warnings of missing-docstring because I think Pylint is a bit too eager to have me document each and every function, method and class that I write. The issue of superfluous-parens is interesting not because I'm likely to remove the parentheses in this case, but because it calls out the fact that this was run with Python 2, and it's a simple fix to add from __future__ import print_function and take a step towards future-proofing my code.

I rather like the fact that Pylint calls out the fact that I am not doing the most idiomatic thing by invoking the .keys() method on the dictionary. If I bothered to write idiomatic code, as the linter suggests I might instead end up with something like this:

from __future__ import print_function

address_book = {
    "Tom": {
        "street": "123 Main St.",
        "state": "AL",
        "phone": "888-999-0000"
    },
    "Richard": {
        "street": "999 W. East Ave.",
        "state": "NY",
        "phone": "800-123-4567"
    },
    "Harry": {
        "street": "456 South St.",
        "state": "AK",
        "phone": "212-212-2121"
    }
}

for person in address_book:
    print("{}'s phone number is: {phone}".format(person,
                                                 **address_book[person]))

Pylint Config Files

One thing that tripped me up briefly was Pylint's hierarchical search for configuration files. I was attempting to test the default configuration and having a hard time tracking down where a few lingering settings were coming from. It turns out Pylint looks in the following order for config files:

  1. pylintrc in the current working directory
  2. .pylintrc in the current working directory
  3. If the current working directory is in a Python module, Pylint searches up the hierarchy of Python modules until it finds a pylintrc file.
  4. The file named by environment variable PYLINTRC
  5. If you have a home directory which isn’t /root:
  6. and finally /etc/pylintrc

Where I had an old config file in my home directory under .config. Ultimately, this lets you tailor Pylint at the module level within a project and is, I think, a good thing.

Radon

Radon is a tool to measure code complexity. While I think the results are interesting, I have some doubt about how useful they will prove. For anyone familiar with a code-base I think Radon will only serve to highlight those parts which are known to be problematic, or hairy. What I ended up suggesting was making Radon available and agreeing to a system that says: if you end up working in a poorly graded part of the code it is your responsibility to clean it up. Only time will tell if Radon proves to be a useful addition to the set of tools available.

Pychecker

Pychecker often makes the list of available tools if you search for "python linter" or "python static analysis" but for me, it was hardly in the running. I couldn't find an easy way to install (meaning with pip) and I can hardly recommend a tool whose only means of installation are basically "go download this tarball from SourceFourge".

Prospector

Prospector is less of a tool and more of a meta-tool, it strings together three separate tools: Pylint, pep8, and McCabe complexity; under one interface. It purports to provide better defaults and automatically detect the use of libraries that traditionally trip up linters (such as Django).

I decided against Prospector because of the particular workflow I was working towards, one where a version control hook would flag the most egregious errors and leave style critiques available at the developer's discretion. I didn't see much benefit in entangling the tools together when they wouldn't necessarily be used together.

Monkey Patching Pyflakes

One nagging issue with Pyflakes is the inability to white-list "issues" or flag specific lines as being ignored. Thankfully, Python is dynamic enough that we might add this feature ourselves. Chase Seibert wrote up a solution to this very issue in 2013. However, Pyflakes changed its internals at some point in the intervening years and attempting the above results in the following error:

Traceback (most recent call last):
  File "/tmp/flake.py", line 17, in <module>
    pyflakes.main()
  File "/usr/local/lib/python2.7/site-packages/pyflakes/api.py", line 172, in main
    warnings = checkRecursive(args, reporter)
  File "/usr/local/lib/python2.7/site-packages/pyflakes/api.py", line 129, in checkRecursive
    warnings += checkPath(sourcePath, reporter)
  File "/usr/local/lib/python2.7/site-packages/pyflakes/api.py", line 96, in checkPath
    return check(codestr, filename, reporter)
  File "/usr/local/lib/python2.7/site-packages/pyflakes/api.py", line 57, in check
    w = checker.Checker(tree, filename)
  File "/usr/local/lib/python2.7/site-packages/pyflakes/checker.py", line 299, in __init__
    self.runDeferred(self._deferredAssignments)
  File "/usr/local/lib/python2.7/site-packages/pyflakes/checker.py", line 332, in runDeferred
    handler()
  File "/usr/local/lib/python2.7/site-packages/pyflakes/checker.py", line 833, in checkUnusedAssignments
    self.report(messages.UnusedVariable, binding.source, name)
  File "/tmp/flake.py", line 8, in report_with_bypass
    text_lineno = args[0] - 1
TypeError: unsupported operand type(s) for -: 'Name' and 'int'

The necessary correction to enable a # no_lint annotation is as follows:

#!/usr/bin/env python

from pyflakes.scripts import pyflakes
from pyflakes.checker import Checker


def report_with_bypass(self, messageClass, *args, **kwargs):
    text, _ = args
    with open(self.filename, 'r') as code:
        # mind the zero-based index of the read lines
        if code.readlines()[text.lineno-1].find('no_lint') >= 0:
            return
    self.messages.append(messageClass(self.filename, *args, **kwargs))

Checker.report = report_with_bypass

pyflakes.main()

Ultimately, I think the tools available to introspect Python code are a little thin compared to those available in less dynamic languages. It seems the best way to catch issues early is through a combination of both testing and a tool like Pyflakes. I'll be sure to amend my opinions as I use the above tools more regularly, for now, it's better than nothing.

[nolan@nprescott.com] $> █