[nolan@nprescott.com] $>  cat blog archive feed

Forth, Data Modeling, Databases

2016-08-25

I've been reading more about Forth and fighting the urge to coerce it into something fundamentally contrary to the design goals of the language. I've been stewing on similarities I see between the design and implementation of these Forth-isms and relational databases, which I've also been reading more about.

New Found Understanding

Relational database tables lean on the uniform nature of tables to optimize/standardize access to data from within queries. Every row in a database is has a known width, within each row the fields are similarly defined by a schema.

As an example, picture a table such as the following:

Attribute Type
id INTEGER
title VARCHAR(255)
product_id INTEGER
client_id INTEGER

Assuming 4 byte integers and eliding any "fancy" database optimization each row in the table would be 267 bytes wide. Iterating a table in a full table scan then is a simple matter of moving a pointer by these fixed offsets. I feel silly for it, but only just made the connection that indexes on fields within tables then map directly to offsets within byte-arrays when you account for the field widths.

id title product_id client_id ... ... ... ... ... ... ... ...
4 255 4 4 4 255 4 4 4 255 4 4

An index on product_id can be achieved by moving +259 bytes off the boundary of each row (267 bytes). This was for me a small revelation that served to remove a lot of the "magic" that I've always perceived databases to have. The real magic remains in the heavy optimization and more exotic features that have been built in (along with query planners I'm sure).

Forth, Files and Blocks

I think there is a pleasing kind of simplicity to the naive table demonstrated above and it occurred to me that the same kind of idea was present in early Forth implementations which eschewed modern file-system approaches in favor of blocks and problem-specific memory layouts. The 1k blocks, or screens, were rigidly defined regions of memory or disk used to contain source code; by fixing the size of each block Forth writers were able to squeeze things like directories and files out of the core operating system entirely. Instead of editing "sample.fth" the specific block was named by index such as "203 edit" because each block had a known size. There are some obvious down-sides to this approach, which has since fallen out of favor (or fashion).

To The Point

What I think is interesting and worth taking away from the resource-scarce days of yore is the focus Forth put on defining problem-specific data structures and even languages for individual applications. In the same way that SQL is optimized around relational database architectures (such as indexes in SELECT mapping to pointer arithmetic, as demonstrated above), Forth programmers strive (strove?) to create similar optimizations in their programs. By eschewing "standard" data structures, Forth can force programmers to instead invent only as much as is needed to solve the problem at hand, a kind of extreme YAGNI.

It is a bit alien to work in a language with only the roughest idea of what a string is, but it is liberating to realize how little abstraction there is between what is written and what is executed. I'm not sure yet of the practicality of Forth, but as a pedagogical tool it has been invaluable to me.

[nolan@nprescott.com] $> █