Explaining a Patch

2021-01-13

I just sent a patch out and thought I might try a thorough breakdown of not just the how but the why as well. I've been interested in writing up something like this as a demonstration of how I like working and documenting my work.

The Bug

I saw this bug was filed against the Mercurial hosting on sourcehut and thought I might take a look. I have a vested interest in hg.sr.ht after all. The gist of it is just that the log view for a repository skips a commit between pages.

a directed graph showing 7 nodes separated between two pages, with the fourth node omitted from both pages

An Approach

Without knowing much about the problem or code I did the most obvious thing that occurred to me, which was to just grep for /log at the top of the source repository. Immediately that narrowed things down to one of two blueprints for the available log routes within the application. It was similarly easy to discard one of those (api.py) due to the prefix on the route, which didn't match the bug report. With that out of the way, I have a starting place with ./hgsrht/blueprints/repo.py.

The route I found renders a very simple template without any real logic, just looping over "commits" to populate the HTML. At this point, I ruled out Jinja and moved back to the log method inside the blueprint. It looks like the list of commits is munged around a bit to populate data about branch heads and bookmarks, but originally they come from the method get_log.

The get_log method is interesting because it provides the number of commits per-page, which I verified against what I could see in the browser (there were in fact 20 commits shown). It also returns two arguments, the list of commits as well as a next_id. I followed the logging code backwards a bit before concluding that it didn't seem to be doing anything really novel. The second argument to get_log is interesting because it is called only rev in the caller, but the function definition calls the argument start_commit. Obviously this felt like a bit of a clue.

The specific revision is based on query arguments, but close reading showed that the pagination button on the log view (labeled "Next") populated the query argument in the case that there were more commits after those shown with the value of next_id.

the same directed graph but with the fourth node now labeled next_id

With a "From" argument provided by the query parameters of the button, it was starting to come together for me. In the case of such an argument the following assignment was made to the value that would later be passed as the start_commit:

        from_rev = request.args.get("from")
        if from_rev:
            rev = from_rev + '^'

Like git, Mercurial has a shorthand for operators to specify revisions. The above example of x^ is explained in the docs like this:

Equivalent to "x^1", the first parent of each changeset in x.

the same directed graph with the fourth node labeled next_id and the fith node labeled its parent

So in effect, we log a page, find the next commit, and then log from the parent of the next commit. This seems like a good explanation for the missing commits.

How Did We Get Here

I am always a little paranoid that I've misunderstood a bug until I can sufficiently explain why it occurred. While the explanation I've laid out made sense to me I'm never quite happy unless I can explain things from another direction.

In this case, I invoked the history of the problematic line and found it was introduced in the initial commit of the file. Visiting this revision showed the following:

        commits_per_page = 20
        commits = get_log(hg_repo, rev, path, commits_per_page + 1)
        has_more = len(commits) == (commits_per_page + 1)
        commits = [get_rev_view_data(r, owner, repo)
            for r in commits[:commits_per_page]]

        if has_more:
            query_params = {'from': commits[-1].node}

While substantially similar to the code I read, it differed in a few obvious ways. Firstly, get_log did not return two values, meaning no next_id was available for the "Next" button. Secondly, the commits per-page were 1 greater than the number shown, which is used to gauge whether more pages exist. Finally, the "from" query parameter was supplied as the last item in the list of only the commits per-page (which was a little confusing with the N+1 commits).

a directed graph showing 7 nodes separated between two pages, with the third node labeled commits[-1] and the fourth node labeled Parent, the seventh node has been paginated to a new third page

This finally put my mind at ease because I could explain why the bug occurred. When it was originally written the logic was sound because the "Next" page was logged "from" the last commit of the prior page. Once the logic changed and the next_id was supplied, things fell apart because the "parent" operator was duplicating the work of the new next_id.

Thoughts

This is all rather long-winded for such a simple bug. I tried timing myself and spent about 30 minutes on the actual patch, from reading the bug report to making the commit; this post though took around 2 hours to write. Obviously my writing process could be sped up, but I think it is a fair characterization of my work. I spend much longer reading and explaining than I do writing code.

Also, this will all look rather silly if the patch gets rejected.