A Different Kind of Optimization

2018-07-04

Attempting to analyze and profile one of my projects I've come to appreciate an orthogonal solution to the need for speed.

How Much Speed Do You Need?

I've written previously about profiling my own static site generator, at which point I measured generating this entire weblog to take ⁵⁄_10ths of a second, and then identified a fix to reduce things by about 25%.

Currently that measure is a bit slower, as a result of more content to regenerate. With about 56 posts spanning several years the entire site now takes:

$ time -v quiescent
    Command being timed: "quiescent"
    User time (seconds): 0.33
    System time (seconds): 0.05
    Percent of CPU this job got: 61%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.64
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 93552640
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 6450
    Voluntary context switches: 0
    Involuntary context switches: 543
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

About ⁶⁄_10ths of second.

Discarded Ideas

Without thinking too deeply about the problem, I considered using Python's concurrent.futures module, specifically with either a proccessPoolExecutor or threadPoolExecutor. The idea being, multiple processes or threads could operate on "posts" independently and reduce time spent in IO of reading and writing files.

After prototyping out a test with both I didn't see appreciable gains because a significant portion of time is in file IO across multiple function calls, so there is no easy-win to be had by simply invoking executor.map (which shouldn't really be surprising). Combined with the fact that there is an overhead to both, I dismissed both prototypes as over-architecting a solution. I still think there is a potential solution here, possibly creating a thread pool in the Static class and using it for all IO (reading raw data and writing formatted posts), rather than just reading posts, which is what I tried.

A Novel Approach

In absolute terms, ⁶⁄_10ths of a second is not very long. The issue is that immediate feedback feels slow as a result of the visual "hang" that occurs after your type quiescent and then hit ENTER. This is more a user-experience issue than an engineering problem. With that in mind I installed fswatch which is like a cross-platform version of inotify. I thought that if I wasn't manually invoking the command to rebuild, I probably wouldn't care much how long it takes. It doesn't solve the problem of generation times, but what it does is make it invisible.

I setup a "watch" on the posts directory, which accomplishes a notification on file-save events, from there an infinite loop on read (which blocks on input) rebuilds the site via the quiescent command:

$ fswatch posts/ | while read _; do
    quiescent
done

A Good Enough Solution

Applescript is, to be honest, pretty terrible, but I may have finally found a legitimate-to-me use for it though (I save the following to a file refresh-tab.script):

tell application "Google Chrome" to tell the active tab of its first window
    reload
end tell

What this does is refresh the current browser tab in Google Chrome without losing page position or requiring the application take window focus. My only regret is that I haven't found a similar solution for Firefox.

From the top-level of the weblog tree, I'll start a local web-server in the build directory and background it:

$ cd build; python -m http-server &; cd -

Modifying the re-build pipeline from above with the Applescript achieves a rebuild pipeline that requires no interaction on my part:

$ fswatch posts/ | while read _; do
    quiescent; osascript refresh-tab.script;
done

What that looks like in practice:

I think a big part of why this appeals to me is that it avoids over-complicating my static site generator. I didn't incorporate automatic refreshes, a local web-server, or file-watching into my project, I built up a custom solution using small, focused tools.