[nolan@nprescott.com] $>  cat blog archive feed

Static Site Generators

2016-07-20

The static site generator ecosystem is awash with with hundreds of projects in what seems like as many languages. Each purports to solve one problem or another, and it can be unclear what each is suited for. I've recently taken a look into several of them, this was my take-away.

Just How Many Are There?

Of course I can only take a rough stab at figuring out how many site generators there are in the wild - but StaticGen lists 147 different projects. You'll excuse me now for admitting I didn't bother looking through all of them, I cherry-picked the most popular and avoided projects that I was uninterested in outright (anything Ruby/PHP).

Project Language
Pelican Python
Nikola Python
Blogofile Python
Frog Racket
Hexo JavaScript
Hugo Go
liquidluck Python

These are obviously Python-centric, owing to my preference for the language. I browsed a few more but these were the projects I considered most heavily. Ultimately, I found issue with each of them, they simply didn't seem suited to my desire to keep a very basic hierarchical structure to the posts which was mirrored on the web-server. For example:

.
├── 2015
│   ├── book-review-flask-web-development.md
│   ├── browsers-in-2015.md
│   ├── ...
├── 2016
│   ├── a-microcosm-of-project-management.md
│   ├── ab-testing.md
│   ├── static
│   │   ├── 56cb5e5a1781c.jpeg
│   └── ...
├── index.md
└── static
    └── style.css

I like having the same structure between written content and generated content. The same way I like encapsulating a year-local static media directory. It maps cleanly to how I think about my weblog.

Features, or Bloat?

The next issue I found in trying to assess each of these projects was how easily they might be adapted to my use. I'm certain I could adapt each to use the directory structure above, it's just a matter of figuring out how. I started to notice a trend in how large each project was.

Project Lines of Code
Pelican 23701
Nikola 27158
Blogofile 4079
Frog 3233
Hexo 22285
Hugo 32408
liquidluck 3679

Surely You're Joking

I was blown away at how large each of these projects was. This doesn't include necessary libraries/dependencies for each. This is simply application code to generate static HTML from a simplified markup language (typically markdown).

I don't even know what some of these projects accomplish in more than twenty thousand lines of code. As a point of reference - the Unix V6 operating system, written in C, is just about 9,000 lines of code. Many of them include plug-ins or add-ons for whole host of things which I am uninterested in (Disqus commenting, cross-post to Twitter, tag clouds, multi-author formats, etc.) but without reading through each project closely I was having difficultly telling how ingrained each "feature" was with the whole.

What Prompted the Switch

This blog was originally written using markdown and Frog but I felt a little constrained in how I had to use it (through it's own raco commands). I became enamored with how slick the document work-flow was using org-mode and migrated things. It was about a year before it became obvious that org-mode wouldn't scale well in the long term. I ran into trouble exceeding the maximum open file limit and had to bump it up with ulimit -n but what really drove me to change was the speed of generation. Org-mode is simply too slow for my needs, despite the built-in caching.

An Unintended Consequence

I began migrating my old org-mode posts to markdown1 and only then realized how using some of the more "exotic" features crippled the ease of migration. Org-babel is a wonderful thing for interactive programming and exploratory analysis, but it could tend towards junk-text using Pandoc to convert documents. I briefly considered just using the direct HTML export from org-mode.

I spent a few hours converting things and cleaning up and I think the lesson here is a valuable one that enforced my decision regarding a static site generator. I am now intensely wary of the more magical features in each project. Markdown (and the basics or org-mode) are important to me because plain text is a nearly universal medium that is easier to work in than HTML. The minute you lose that, you seriously cripple what you can do, or increase the amount of time it takes to do it.

Decisions

Ultimately, I decided I would probably be happiest writing my own static site generator. I realize the irony in contributing one more project into the already crowded space, but in my case I won't be advertising it as anything but my own best system. Features are, right now, limited to automatic generation of an archive page and a sketchy Atom feed. It's what I'm using now to create this site and can be found on GitHub.

I wrote it in Python and tried to limit the amount of external libraries, the standard library is really quite good in Python. I am using Jinja for templating, which feels a bit like using a sledgehammer to drive nails, but it is probably the most widely used templating engine in the Python ecosystem right now. I am using Mistune for markdown parsing but was sure to encapsulate its use in the event that I need to replace it with another markdown engine in the future.

As of this writing, static is about 200 lines of code. It completely regenerates my blog (as of this writing) in about 410ths of a second.


  1. What to say about markdown? I, more or less, like the idea of markdown - a simplified syntax for HTML documents that fully supports inline HTML. But in converting back and forth between document types and fiddling with formatting I feel acutely aware of some of the failings it presents. The syntax ambiguities can be frustrating, but also multi-line elements (such as headers) are a total pain to work with using some of my default text tools (shell utilities and regular expressions). I plan on limiting myself to a more stringent subset of the syntax in the future, even if I haven't yet cleaned up the mess that resulted from the Pandoc conversion process.
[nolan@nprescott.com] $> █