Software development metrics can be difficult, knowing what to track and how to observe things without gaming them can seem like an art more than a science. I've been in discussions lately about how to track code review metrics meaningfully and how to do so in the context of my absolute least favorite code review tool.
The intent in surfacing this information is improve out processes as a team. I think we have all been bitten at one point or another with either a pile of code reviews, waiting to be done, or waiting for someone to respond to an open review.
Pending code reviews represent blocked threads of execution.1
While I find this to often be true in software development, this particular foray was a depressing highlight to that theme. I waffled on the different ways to try and visualize what information I have available regarding code reviews before settling on what I think of as an imitation Gantt chart but the real trouble was in inferring the necessary information from what I had available from Code Collaborator.
By far the most painful code review software I've had the displeasure of using, Collaborator makes it difficult to both perform reviews as well as report on their status. The "JSON API" available is accessed through a series of commands and arguments in the most awkward kind of RPC I've had to use in a real product.
So with the API out of the question, I instead opted for a daily pull of those
reports available as a CSV file from the web UI, written into a SQLite
database. While this is an easy way to pull down what information is available,
I found out a little late that the reports are not so different from their API,
which is to say, ugly. The CSV file is written to the database using pandas'
to_sql
, with a timestamp for when the write occurred (it runs on a cronjob
once a day), the schema is:
sqlite> .schema
CREATE TABLE "reviews" (
"index" INTEGER,
"ID" INTEGER,
"Review Title" TEXT,
"Review Creation Date" TEXT,
"Phase" TEXT,
"Author Full Name" TEXT,
"Reviewer Full Name" TEXT,
"Defect Count" INTEGER,
"Idle For" TEXT,
"date" TIMESTAMP
);
CREATE INDEX "ix_reviews_index"ON "reviews" ("index");
The majority of what I have here is dedicated to cleaning up and standardizing multiple date formats, deduplicating records, and inferring things like "when was a review closed" - because why would a customer report bother with any of that?
While I normally use gnuplot for any kind of exploratory analysis, I've opted for matplotlib this time because I thought what I was doing was more... presentational(?) than what I might be able to accomplish with my existing knowledge of gnuplot2. With matplotlib, I'm more confident in my ability to wrangle the data directly with Python to overcome any peculiarities necessary to build the graphic. I recently found out about the addtion of styles to MPL, so I couldn't resist trying one, here I'm using ggplot.
There can be reviews that span months, which would skew the graphic if we tried
to accommodate the full time taken, so instead I define an area of interest,
given by DAYS_PRIOR
, which sets a upper-bound on the date range. I also
assign a color to each developer, so that the graphic isn't cluttered with
everyone's name repeatedly (along with the fact that I think
color aides in comprehension).
import matplotlib.dates
import matplotlib.pyplot as plt
import matplotlib
import datetime as dt
import textwrap
import sqlite3
from collections import defaultdict
plt.style.use('ggplot.mpl')
matplotlib.rcParams.update({'font.size': 22})
DAYS_PRIOR = 14
start_of_observation = (matplotlib.dates.date2num(
dt.datetime.today()
- dt.timedelta(days=DAYS_PRIOR)))
colors = (each['color'] for each in matplotlib.rcParams['axes.prop_cycle'])
devs = ("Nolan Prescott", "Coworker A", "Coworker B", "Coworker C")
color_map = dict(zip(devs, colors))
Because the CSV file is read in everyday as an append operation to the database, there is a good deal of duplication. I have kept the format due to the fact that there is no historical information available through collaborator, so I manage it (crudely) myself. The first order of business then is de-duplicating the reviews so that there is 1 unique review with the latest date that it was being worked. I accomplish this with a sub-query to get the most recent review per-title, and then querying the full table based on a match to that date.
The final WHERE
clause is to filter out other teams, who are also included in
the CSV report, by matching on my coworkers by name. This could probably be
managed by tweaking the report in Collaborator, but there's no harm in
capturing too-much information here.
conn = sqlite3.connect('reviews.db')
c = conn.cursor()
all_reviews = (row for row in c.execute("""
SELECT t1."Review Title", t1."Review Creation Date", strftime("%Y-%m-%d", t1.date), "Author Full Name"
FROM reviews t1
WHERE t1.date = (SELECT t2.date
FROM reviews t2
WHERE t2."Review Title" = t1."Review Title"
ORDER BY t2.date DESC
LIMIT 1)
AND "Author Full Name" IN
("Nolan Prescott", "Coworker A", "Coworker B", "Coworker C")
ORDER BY "date" DESC;
"""))
At this point I have a list of tuples that looks approximately like:
Title | Creation Date | Date | Author |
---|---|---|---|
PLC Diagnostics | 2017-06-02 15:25 UTC | 2017-06-02 | Nolan Prescott |
Performance Improvements | 2017-06-01 22:49 UTC | 2017-06-02 | Coworker A |
Feature design review | 2017-04-13 20:11 UTC | 2017-06-02 | Coworker B |
which you may notice has dates in two formats, so the next step is some minor,
ugly munging on the dates to normalize their format and cast to a numeric
format that MPL understands. The more interesting step is in filtering to
review_of_interest
, where reviews that completed outside the window we
defined with DAYS_PRIOR
as the "area of interest" are omitted.
def parse_date(date_string):
date = dt.datetime.strptime(date_string, '%Y-%m-%d')
mdate = matplotlib.dates.date2num(date)
return mdate
def normalize_format(reviews_tuples):
return [(title, parse_date(created.split()[0]), parse_date(date), author)
for title, created, date, author in reviews_tuples]
def reviews_of_interest(normalized_reviews):
return [(title, start_date, end_date, author)
for title, start_date, end_date, author in normalized_reviews
if end_date > start_of_observation]
Finally, I'm ready to do the actual visualization, which is pretty
straight-forward once the data has been adequately pre-processed. I think there
is some room for improvement with the two successive for-loops over the data,
but the specifics aren't very complicated. By building horizontal bars (barh
)
with a left-end as the greater of either the window of interest or their actual
start-date, I avoid the problem of skewing the graphic. The width of the bar is
taken from subtracting the start date from the end date. Finally, the
developer's unique color is looked up to assign the horizontal bar fill-color.
The arbitrary numbers designating figure size are just that, a bit of trial and error landed me at a width of 1.5 times the length of the list of reviews and 80% of that width for the height. It's not perfect, but better accommodates a variable number of reviews than a fixed figure size.
def visualize_reviews(list_of_tuples):
x_ratio = int(1.5 * len(list_of_tuples))
y_ratio = int(0.8 * x_ratio)
fig = plt.figure(figsize=(x_ratio, y_ratio))
ax = fig.add_subplot(111)
reviews = defaultdict(dict)
for title, start_date, end_date, author in list_of_tuples:
title = textwrap.fill(title, width=30)
reviews[title]['start_date'] = max(start_date, start_of_observation)
reviews[title]['end_date'] = end_date
reviews[title]['color'] = color_map[author]
for index, ticket in enumerate(reviews):
ax.barh(index,
reviews[ticket]['end_date'] - reviews[ticket]['start_date'],
left=reviews[ticket]['start_date'],
color=reviews[ticket]['color'])
plt.yticks(range(len(reviews)), reviews.keys())
format = matplotlib.dates.DateFormatter("%b %d")
ax.xaxis.set_major_formatter(format)
ax.xaxis_date()
plt.show()
normalized_reviews = normalize_format(all_reviews)
current_reviews = reviews_of_interest(normalized_reviews)
visualize_reviews(current_reviews)
(with a bit of made-up data)
While I'm happy to have generated something approaching what I first envisioned, I am disappointed that I have been unable to include more context in the chart. I would have liked to include which reviewers were assigned to each review, but again Collaborator rears its ugly head in reporting "(multiple)" for any review with more than one reviewer. Since that is nearly every review we do at work, I have omitted it until I can find a work-around (or figure out how to get that information from the API).
I have been impressed by Matplotlib, having not used it since college I was pleasantly surprised at how far it has come. While I know there are supposed to be better interfaces to it, like Seaborn, I found I could accomplish what I had imagined without too much fuss.
This one is tricky, I think the graphic is interesting, but I'm not certain there is enough information in it to be actionable. I think it could be useful to surface information like the number of defects found as they relate to the number of reviewers, or time taken as it relates to the number of reviewers, but I lack the per-person level of information necessary at this time. I'll discuss it with my coworkers and see if we can't gain some insight, or inform later iterations on this project.