I just sent a patch out and thought I might try a thorough breakdown of not just the how but the why as well. I've been interested in writing up something like this as a demonstration of how I like working and documenting my work.
I saw this bug was filed against the Mercurial hosting on sourcehut and thought I might take a look. I have a vested interest in hg.sr.ht after all. The gist of it is just that the log view for a repository skips a commit between pages.
Without knowing much about the problem or code I did the most
obvious thing that occurred to me, which was to just grep
for /log
at the top of the source
repository. Immediately that narrowed things down to one of two
blueprints for the available log routes within the
application. It was similarly easy to discard one of those
(api.py
) due to the prefix on the route, which didn't
match the bug report. With that out of the way, I have a starting
place with ./hgsrht/blueprints/repo.py
.
The route I found renders a very simple template without any real
logic, just looping over "commits" to populate the HTML. At this
point, I ruled out Jinja and moved back to
the log
method inside the blueprint. It looks like the list of commits
is munged around a bit to populate data about branch heads and
bookmarks, but originally they come from the
method get_log
.
The get_log
method is interesting because it provides
the number of commits per-page, which I verified against what I
could see in the browser (there were in fact 20 commits shown). It
also returns two arguments, the list of commits as well as
a next_id
. I followed the logging code backwards a bit
before concluding that it didn't seem to be doing anything really
novel. The second argument to get_log
is interesting
because it is called only rev
in the caller, but the
function definition calls the
argument start_commit
. Obviously this felt like a bit of a clue.
The specific revision is based on query arguments, but close reading
showed that the pagination button on the log view (labeled "Next")
populated the query argument in the case that there were more
commits after those shown with the value of next_id
.
With a "From" argument provided by the query parameters of the
button, it was starting to come together for me. In the case of such
an argument the following assignment was made to the value that
would later be passed as the start_commit
:
from_rev = request.args.get("from")
if from_rev:
rev = from_rev + '^'
Like
git, Mercurial
has a shorthand for operators to specify revisions. The above
example of x^
is explained in the docs like this:
Equivalent to "x^1", the first parent of each changeset in x.
So in effect, we log a page, find the next commit, and then log from the parent of the next commit. This seems like a good explanation for the missing commits.
I am always a little paranoid that I've misunderstood a bug until I can sufficiently explain why it occurred. While the explanation I've laid out made sense to me I'm never quite happy unless I can explain things from another direction.
In this case, I invoked the history of the problematic line and found it was introduced in the initial commit of the file. Visiting this revision showed the following:
commits_per_page = 20
commits = get_log(hg_repo, rev, path, commits_per_page + 1)
has_more = len(commits) == (commits_per_page + 1)
commits = [get_rev_view_data(r, owner, repo)
for r in commits[:commits_per_page]]
if has_more:
query_params = {'from': commits[-1].node}
While substantially similar to the code I read, it differed in a few
obvious ways. Firstly, get_log
did not return two
values, meaning no next_id
was available for the "Next"
button. Secondly, the commits per-page were 1 greater than the
number shown, which is used to gauge whether more pages
exist. Finally, the "from" query parameter was supplied as the last
item in the list of only the commits per-page (which was a little
confusing with the N+1 commits).
This finally put my mind at ease because I could
explain why the bug occurred. When it was originally
written the logic was sound because the "Next" page was logged
"from" the last commit of the prior page. Once the logic changed and
the next_id
was supplied, things fell apart because the
"parent" operator was duplicating the work of the
new next_id
.
This is all rather long-winded for such a simple bug. I tried timing myself and spent about 30 minutes on the actual patch, from reading the bug report to making the commit; this post though took around 2 hours to write. Obviously my writing process could be sped up, but I think it is a fair characterization of my work. I spend much longer reading and explaining than I do writing code.
Also, this will all look rather silly if the patch gets rejected.