[nolan@nprescott.com] $  cat weblog archive feed

Back to Org?

2021-02-15

Table of Contents

As I continue to experiment and refine my own writing setup I have recently been trying out a quirky system not so different from the original setup for this weblog.

Ancient History

Funnily enough, the readme that I've been carrying around with this blog from the beginning has this to say about the history of things:

It's history is getting only slightly convoluted:

In particular, the migration to org-mode didn't last because I was unhappy with how rigid the structure was, particularly around publishing and the lack of things like RSS. Of course, with hindsight and experience I can see how little I understood things.

While I've been pretty happy writing plain HTML for a while, I do occasionally forget to escape a > or < sign. Rather than re-integrate markdown or anything I set out to extend my tools while keeping the parts that I like.

A New (Old) Alternative

I found a concise explanation for exactly the sort of thing I wanted to do and didn't have a name for. What I really needed, when I wasn't happy with the exact HTML output from Org was to make a derived back-end for export. In this way I can pare down to just the pieces I want without having to write an entire exporter myself.

In my particular case what I want to do is export org-mode text into partial HTML documents that can be directly consumed by my existing static site generator. All this requires is a format like the following:

<h1>A Title Goes in a Leading H1 tag</h1>
<span>a date goes in a span tag</span>
<p>a leader paragraph follows</p>
...

That is enough to process out my automatic index page, archive page, and RSS feed.

(defun weblog-post-template (contents info)
  (concat
   (format "<h1>%s</h1>\n"
           (org-export-data (or (plist-get info :title) "") info))
   (format "<span>%s</span>\n"
           (org-export-data (or (plist-get info :date) "") info))
   contents))

(defun org-export-to-post
    (&optional async subtreep visible-only body-only ext-plist)
    (interactive)
    (let ((outfile (org-export-output-file-name ".post" subtreep)))
      (org-export-to-file 'post-html outfile
        async subtreep visible-only body-only ext-plist)))

(org-export-define-derived-backend 'post-html 'html
  :menu-entry '(?z "Export to weblog post" org-export-to-post)
  :translate-alist '((template . weblog-post-template)))

That is sufficient to register my new export back-end so that when invoking org-export-dispatch with C-c C-e the following prompt is available:

Sorry, your browser does not support SVG.

Figure 1: Org's dispatch menu means generating my post HTML is one keypress (z) away

Motivation

Part of the motivation in getting back into Org mode is the kind of interactivity that it lends itself to. Heck, even configuring this new back-end was basically a breeze. While I don't mind writing HTML I have found that I end up basically transcribing my own notes at least twice. Org mode is more like a lab notebook while my previous straight-HTML format was like the finished product. I'm looking forward to keeping things easy so I can focus on projects and substance rather than wearing myself out re-writing things.

DONE Turn off syntax highlighting

This one just bugs me, turns out it is easy enough to keep my preferred plaintext code blocks:

(setq org-html-htmlize-output-type 'nil)

DONE Fix Atom feed links

While I was in here making sure I hadn't inadvertently broken a bunch of links I finally got around to removing the hack that was <base href='/'> inside the page template. Consequently, I've half-jokingly included Org's automatic table of contents at the top of the page because the hash-references should no longer be broken, here or in the feed!

Feed Reader Conformity Nightmares

While there are specifications and broad agreement on how feed formats should work, in practice most readers only implement the most common pieces. The piece giving me the most grief today has been xml:base support. In an effort to fix the kludge-y bit of HTML that makes so many of my absolute URLs work throughout the atom feed I set about doing things right. In my case, that meant replacing the <base href='/'> from the HTML template and fixing up countless dangling relative and quasi-relative URLs in old posts.

I found the XML:base specification and was delighted to see it was exactly the sort of thing I needed to make my relative URLs work in the feed without changes to my posts. It wasn't difficult to get the attribute added, in my case I already has an attribute tracked on the post objects to suffice. No, the real problem lay in trying to test it out. It seems that vanishly few feed readers correctly implement XML:base.

I thought my implementation was to blame for how poorly things were going with testing. At least until I found this XmlBaseConformanceTests and information on testing from the Python feedparser library. I was able to ingest my feed and verify that my xml:base and relative URLs were all resolving correctly. I also downloaded Thunderbird and tried my new feed there, where it worked. I can't really account for why so many different readers seem to struggle with it, but if it seems like images are broken in the feed please verify your reader isn't broken!

Thoughts

I'm not certain I will keep this setup but the best part has got to be that it costs nothing to switch. I'll be committing both the .org and generated .post (HTML) file for now, partly because I can't imagine ever doing a big wholesale migration across formats again. If I do decide that I don't like the way things are going I can simply stop using Org and nothing changes. The site generator stays the same and everything keeps working. While it might be possible to integrate emacs and org into the site generation I see that only as a pitfall to avoid. In case you've never bothered checking out Org mode, I'll go ahead and drop a link to this post's raw-text. It certainly is briefer than HTML can be albeit similarly quirky.