Code Review Metrics

2017-06-04

Software development metrics can be difficult, knowing what to track and how to observe things without gaming them can seem like an art more than a science. I've been in discussions lately about how to track code review metrics meaningfully and how to do so in the context of my absolute least favorite code review tool.

Code Review Metrics?

The intent in surfacing this information is improve out processes as a team. I think we have all been bitten at one point or another with either a pile of code reviews, waiting to be done, or waiting for someone to respond to an open review.

Pending code reviews represent blocked threads of execution.¹

The Bulk of the Work is Cleaning Data

While I find this to often be true in software development, this particular foray was a depressing highlight to that theme. I waffled on the different ways to try and visualize what information I have available regarding code reviews before settling on what I think of as an imitation Gantt chart but the real trouble was in inferring the necessary information from what I had available from Code Collaborator.

Code Collaborator

By far the most painful code review software I've had the displeasure of using, Collaborator makes it difficult to both perform reviews as well as report on their status. The "JSON API" available is accessed through a series of commands and arguments in the most awkward kind of RPC I've had to use in a real product.

So with the API out of the question, I instead opted for a daily pull of those reports available as a CSV file from the web UI, written into a SQLite database. While this is an easy way to pull down what information is available, I found out a little late that the reports are not so different from their API, which is to say, ugly. The CSV file is written to the database using pandas' to_sql, with a timestamp for when the write occurred (it runs on a cronjob once a day), the schema is:

sqlite> .schema
CREATE TABLE "reviews" (
"index" INTEGER,
  "ID" INTEGER,
  "Review Title" TEXT,
  "Review Creation Date" TEXT,
  "Phase" TEXT,
  "Author Full Name" TEXT,
  "Reviewer Full Name" TEXT,
  "Defect Count" INTEGER,
  "Idle For" TEXT,
  "date" TIMESTAMP
);
CREATE INDEX "ix_reviews_index"ON "reviews" ("index");

The majority of what I have here is dedicated to cleaning up and standardizing multiple date formats, deduplicating records, and inferring things like "when was a review closed" - because why would a customer report bother with any of that?

Matplotlib, setup

While I normally use gnuplot for any kind of exploratory analysis, I've opted for matplotlib this time because I thought what I was doing was more... presentational(?) than what I might be able to accomplish with my existing knowledge of gnuplot². With matplotlib, I'm more confident in my ability to wrangle the data directly with Python to overcome any peculiarities necessary to build the graphic. I recently found out about the addtion of styles to MPL, so I couldn't resist trying one, here I'm using ggplot.

There can be reviews that span months, which would skew the graphic if we tried to accommodate the full time taken, so instead I define an area of interest, given by DAYS_PRIOR, which sets a upper-bound on the date range. I also assign a color to each developer, so that the graphic isn't cluttered with everyone's name repeatedly (along with the fact that I think color aides in comprehension).

import matplotlib.dates
import matplotlib.pyplot as plt
import matplotlib

import datetime as dt
import textwrap
import sqlite3
from collections import defaultdict

plt.style.use('ggplot.mpl')
matplotlib.rcParams.update({'font.size': 22})

DAYS_PRIOR = 14
start_of_observation = (matplotlib.dates.date2num(
                        dt.datetime.today()
                        - dt.timedelta(days=DAYS_PRIOR)))

colors = (each['color'] for each in matplotlib.rcParams['axes.prop_cycle'])
devs = ("Nolan Prescott", "Coworker A", "Coworker B", "Coworker C")
color_map = dict(zip(devs, colors))

Data Munging

Because the CSV file is read in everyday as an append operation to the database, there is a good deal of duplication. I have kept the format due to the fact that there is no historical information available through collaborator, so I manage it (crudely) myself. The first order of business then is de-duplicating the reviews so that there is 1 unique review with the latest date that it was being worked. I accomplish this with a sub-query to get the most recent review per-title, and then querying the full table based on a match to that date.

The final WHERE clause is to filter out other teams, who are also included in the CSV report, by matching on my coworkers by name. This could probably be managed by tweaking the report in Collaborator, but there's no harm in capturing too-much information here.

conn = sqlite3.connect('reviews.db')
c = conn.cursor()

all_reviews = (row for row in c.execute("""

SELECT t1."Review Title", t1."Review Creation Date", strftime("%Y-%m-%d", t1.date), "Author Full Name"
FROM reviews t1
WHERE t1.date = (SELECT t2.date
                 FROM reviews t2
                 WHERE t2."Review Title" = t1."Review Title"
                 ORDER BY t2.date DESC
                 LIMIT 1)
AND "Author Full Name" IN
("Nolan Prescott", "Coworker A", "Coworker B", "Coworker C")
ORDER BY "date" DESC;

"""))

At this point I have a list of tuples that looks approximately like:

Title	Creation Date	Date	Author
PLC Diagnostics	2017-06-02 15:25 UTC	2017-06-02	Nolan Prescott
Performance Improvements	2017-06-01 22:49 UTC	2017-06-02	Coworker A
Feature design review	2017-04-13 20:11 UTC	2017-06-02	Coworker B

which you may notice has dates in two formats, so the next step is some minor, ugly munging on the dates to normalize their format and cast to a numeric format that MPL understands. The more interesting step is in filtering to review_of_interest, where reviews that completed outside the window we defined with DAYS_PRIOR as the "area of interest" are omitted.

def parse_date(date_string):
    date = dt.datetime.strptime(date_string, '%Y-%m-%d')
    mdate = matplotlib.dates.date2num(date)
    return mdate

def normalize_format(reviews_tuples):
    return [(title, parse_date(created.split()[0]), parse_date(date), author)
            for title, created, date, author in reviews_tuples]

def reviews_of_interest(normalized_reviews):
    return [(title, start_date, end_date, author)
            for title, start_date, end_date, author in normalized_reviews
            if end_date > start_of_observation]

Plotting

Finally, I'm ready to do the actual visualization, which is pretty straight-forward once the data has been adequately pre-processed. I think there is some room for improvement with the two successive for-loops over the data, but the specifics aren't very complicated. By building horizontal bars (barh) with a left-end as the greater of either the window of interest or their actual start-date, I avoid the problem of skewing the graphic. The width of the bar is taken from subtracting the start date from the end date. Finally, the developer's unique color is looked up to assign the horizontal bar fill-color.

The arbitrary numbers designating figure size are just that, a bit of trial and error landed me at a width of 1.5 times the length of the list of reviews and 80% of that width for the height. It's not perfect, but better accommodates a variable number of reviews than a fixed figure size.

def visualize_reviews(list_of_tuples):
    x_ratio = int(1.5 * len(list_of_tuples))
    y_ratio = int(0.8 * x_ratio)

    fig = plt.figure(figsize=(x_ratio, y_ratio))
    ax = fig.add_subplot(111)

    reviews = defaultdict(dict)

    for title, start_date, end_date, author in list_of_tuples:
        title = textwrap.fill(title, width=30)
        reviews[title]['start_date'] = max(start_date, start_of_observation)
        reviews[title]['end_date'] = end_date
        reviews[title]['color'] = color_map[author]

    for index, ticket in enumerate(reviews):
        ax.barh(index,
                reviews[ticket]['end_date'] - reviews[ticket]['start_date'],
                left=reviews[ticket]['start_date'],
                color=reviews[ticket]['color'])

    plt.yticks(range(len(reviews)), reviews.keys())
    format = matplotlib.dates.DateFormatter("%b %d")
    ax.xaxis.set_major_formatter(format)

    ax.xaxis_date()
    plt.show()

normalized_reviews = normalize_format(all_reviews)
current_reviews = reviews_of_interest(normalized_reviews)
visualize_reviews(current_reviews)

And the result

(with a bit of made-up data)

code review chart

While I'm happy to have generated something approaching what I first envisioned, I am disappointed that I have been unable to include more context in the chart. I would have liked to include which reviewers were assigned to each review, but again Collaborator rears its ugly head in reporting "(multiple)" for any review with more than one reviewer. Since that is nearly every review we do at work, I have omitted it until I can find a work-around (or figure out how to get that information from the API).

I have been impressed by Matplotlib, having not used it since college I was pleasantly surprised at how far it has come. While I know there are supposed to be better interfaces to it, like Seaborn, I found I could accomplish what I had imagined without too much fuss.

How do you use it?

This one is tricky, I think the graphic is interesting, but I'm not certain there is enough information in it to be actionable. I think it could be useful to surface information like the number of defects found as they relate to the number of reviewers, or time taken as it relates to the number of reviewers, but I lack the per-person level of information necessary at this time. I'll discuss it with my coworkers and see if we can't gain some insight, or inform later iterations on this project.