A Taste of TCL

2022-01-02

I have been playing around with TCL and trying it out in a few different projects. While I am brand new to the language I've written up a few notes on the experience so far.

Background

TCL it turns out is an exceedingly small language. All of the rules of syntax are laid out in the Dodekalogue. My impression so far is that the syntax and semantics are neither surprising nor exciting (except for how novel that is in a language). I have found it sufficient to read the rules and browse a few examples which are typically included in the manual. From there it seems I pretty much "get" it, there is room to quibble over things like quoting which I have missed a few times but nothing of consequence.

Impressions

My only prior exposure to TCL was through Python's TKinter module. What really made me take a second look was reading about the origins of SQLite.

SQLite is a TCL extension that has escaped into the wild.

I also read Ousterhout's A Philosophy of Software Design and figured working with the author's creation might provide a good supplement to having some big ideas explained. This turned out to be very compelling, while I read the book I agreed with many points but didn't bother thinking very deeply about them. Using TCL has enforced how far some ideas about simplicity might usefully be taken.

The TCL documentation is really good. The wiki seems off-putting because it comes off as a little bit-rotted, being twenty years old. In fact, all the content I found is still applicable and useful. The only real problem I have found is broken links to external sites. Even ignoring the wiki the man pages are excellent; succinct without being cryptic, include examples, well indexed.

Test Driving TCL

After rewriting a small program from Python to C I got to thinking about how any further performance improvements would need a system of caching results. I started wondering though about what information I would want to cache and how big a performance impact I might see. I was thinking specifically of using SQLite, firstly because I am a big fan of the database and secondly because it means I don't have to work out my own file format for storing data across runs.

The SQLite C API is actually pretty good but I don't think I am yet at a point where I can productively do my prototyping in C directly. Instead I found the SQLite TCL API and was blown away. I can't think of a lower overhead means of interacting with such a high quality project, this feature alone will probably tempt me into using TCL in the future.

The Plan

With ready access to SQLite I resolved to recreate the site generator and integrate a database for storing data across runs and using the file modification time to determine if the HTML needed to be updated. With a real database I also gain the ability to do sorting and filtering more easily. The site generation pieces are minimally interesting and having written them a few times I'm more interested in figuring out the caching portion. Cranking out a bare bones reimplementation ended up taking no time at all because of TCL's glob and regexp features. Given the regular nature of my files, collecting all of the "posts" becomes a one-line operation:

set posts [glob -type f $dir/posts/*/*.post]

It is difficult to argue against how easy that is if the goal is a quick development time. Similarly easy is capturing the few pieces of data I am interested in from the body of the file - possible here with just a few regular expressions:

proc parse {post} {
    set fp [open $post]
    set data [read $fp]
    regexp {<h1>(.*?)</h1>} $data -> title
    regexp {<span>(.*?)</span>} $data -> date
    regexp {<p>(.*?)</p>} $data -> leader
    close $fp
    return [list $title $date $leader $data]
}

With that out of the way it just becomes a matter of checking the file modification time and comparing to a timestamp from the cache database before updating the database with the file contents parsed above.

I first tried the following schema, thinking I could get away with omitting the entire post body from the cache and instead differing it to a temporary table for those cases where the post needed to be regenerated:

db eval {
    create table if not exists post(path text primary key,
                                    title text,
                                    date datetime,
                                    leader text);

    create temporary table bodies(path text,
                                  title text,
                                  body text);
}

I have since changed my mind about the temporary table but the pursuing these ideas is effortless when the entire thing ends up looking like this:

proc record {post} {
    lassign [parse $post] title date leader body
    db eval {
        insert into post(path, title, date, leader) values ($post, $title, date($date), $leader)
        on conflict(path) do update
        set path=$post, title=$title, date=date($date), leader=$leader
    }
    db eval {
        insert into bodies(path, title, body) values ($post, $title, $body)
    }
}

The main loop of the program looks like this:

foreach post $posts {
    if {[file mtime $post] > $last_ts} {
        record $post
        regenerate $post
    }
}

Where regenerate is just string munging of the file contents and more static HTML data not unlike the C program. The value of last_ts is a timestamp that is written into the cache when the program concludes, it is kept in a table with a single value:

db eval {
    create table if not exists last_generation(id integer primary key,
                                               dt datetime);
}
db eval {
    insert into last_generation(id, dt)
    values (1, datetime('now'))
    on conflict(id) do update set dt=datetime('now')
}

The Result

There are a few more mundane details around building HTML strings, creating "slugs" for the output file names, and writing the files but none of it is surprising or onerous. The whole program took perhaps two hours to write and let me explore a few ideas around caching. The end result is approximately the same speed as Python for the case where each file is read and incurs a write. For the case where a full-site creation is not required it takes a fraction of the time.

Having gone through the whole exercise I feel as though I understand the trade-offs involved. There is a snag in embedding the full post contents inside the atom feed that requires the cache maintain a copy of the post contents, which necessarily means the cache size is equal to the total size of all the posts. There is the potential to do something more clever, like purging any post bodies but those most recent 10 (the maximum number included in the atom feed), but it wouldn't really save having to write them out to the cache at least once.

I've got a schema that works well enough and the queries written. I will probably try adapting the C program to use this caching scheme if for no other reason than the practice.

Thoughts

TCL offers the kind of brevity you might get with shell scripting but with a better language. While TCL is not itself particularly fast it seems effortless to integrate it with fast programs. Python offers the ability to write C code and interoperate with the language but it never seems particularly appealing. With TCL it is an obvious solution with an exceedingly gradual on-ramp.

Another unrelated project I've been picking at has me looking to use TCL and Tk for a quick graphical interface. Once again my impression of TCL is positive and I found this quote:

… Tcl/Tk is wonderfully productive; in a few hours one can accomplish what might well take days or even weeks with C-based tools

Conclusion from Brian Kernighan in Experience with Tcl/Tk for Scientific and Engineering Visualization

I am surprised at how much I have liked working with TCL. I can see using it in the future for the kinds of odds and ends programming that seem to inevitably arise. I am keen to try out starpacks as well, the idea of stand-alone executables is novel after years of things like Python.