[nolan@nprescott.com] $  cat weblog archive feed

Strategic Communication to C

2021-04-09

I saw an esoteric programming language posted online, strategic-communication, and thought it would be a lark to create a second implementation. In the interest of diversifying runtimes and architectures I set to creating a parser and compiling to C.

The code is here.

strategic-communication

Quoting the README, strategic-communication is:

A best-of-breed language with a holistic approach to moving the needle.

In practice, it is an esoteric programming language about as expressive as Ook! or brainfuck. The original implementation is based on a kind of virtual machine for the approximately 15 instructions and 8 registers. The whole joke of the thing is in the obfuscation like corporate ipsum or the corporate b.s. generator. As an example, this program echoes user input:

moving forward, take a holistic approach
crowdsource best practices
restructure best practices to move the needle
deliver best practices
circle back to take a holistic approach
moving forward, move the needle

While I've read a tiny bit about implementing languages I don't really know the first thing about compiling or how to do it. In fact, I read about strategic-communication months ago and didn't give it much thought until I read a great series, Let's Make a Teeny Tiny Compiler. While the focus of the series is mostly on parsing, I thought it was interesting and provided the kind of ground-work necessary for me to get started.

I ended up following along with the series and adapting it into Common Lisp (for fun) and then once I had it working I started tweaking things to lex and then parse strategic-communication. Generally speaking a single line of strategic-communication translates to a line of C code so there are no great feats of transformation. More than anything though it has been a fun project to better understand parsing and how to do things without a wealth of polished tooling.

I attended Racket-Con a few years ago and while it was informative and fun I struggled to connect how something like brag related to the nitty-gritty details outside of Racket. I suppose there is a little too much magic. Not so with my parsing though — it is dreadful! I had to make a few changes that look like kludges to me in order to accommodate the multi-word tokens and I think I learned a lot.

Interestingly, the entire language is comprised of keywords (except for strings which become labels). I can't readily think of a language that deals in multi-word keywords (as compared to multiple words acting as qualifiers or modifiers). The result for me has been an ad-hoc parser that does a simple/dumb thing and continually looks backward at what should be the end of a word before deciding if the token is finished. Take for example these tokens:

       ("BEST PRACTICES"             (make-token :text str :kind :register :value "r3"))
       ("STAKEHOLDER ENGAGEMENT"     (make-token :text str :kind :register :value "r4"))
       ("KEY PERFORMANCE INDICATORS" (make-token :text str :kind :register :value "r5"))
       ("RETURN ON INVESTMENT"       (make-token :text str :kind :register :value "r6"))

While I could have constructed multiple intermediate tokens for each word to assert that "Token: key" was followed by "Token: performance" was followed by "Token: indicators" I instead opted something more like the following pseudocode:

while character?
      OR
      space? AND (substring start current) in (
          "EXECUTIVE"
          "PARADIGM"
          "MOVING"
          "GOING"
          "CIRCLE"
          "CIRCLE BACK"
          "CUSTOMER"
          "REVENUE"
          "CORE"
          "BEST"
          "STAKEHOLDER"
          "KEY"
          "KEY PERFORMANCE"
          "RETURN"
          "RETURN ON"
      )
    continue
else
    new token (substring start current)

It is a little redundant to capture a single token like "KEY PERFORMANCE INDICATORS" by checking at each space if the current substring is "key" or "key performance" but I think it is at least brief and obvious (relatively speaking).

In Practice

The README for strategic-communication says the following about performance:

The way I implemented the interpreter is super naive; it re-parses every line it executes from its string representation every time. There are likely other easy performance gains I've ignored as well.

While I was primarily interested in learning more about Common Lisp and C I thought it would be educational to see the kind of difference interpreted vs. compiled can make. My implementation constructs a local variable for each of the 8 registers and turns constant expressions into literals during the compilation process, so the actual instructions are mostly limited to arithmetic and jumps. The end result when measured against the example programs is huge: in the case of fizzbuzz (an exemplary program demonstrating real work) the interpreted Rust implementation takes over 30 seconds while the compiled C takes 0.017 seconds, this does not account for the actual parsing and compiling to C which takes 0.019 seconds (see below). Furthermore the Rust implementation relies on dozens of libraries for things like regular expressions, random, and printing to the terminal while the Common Lisp compiler version requires none.

Example Run

CL-USER> (compile-to-c "

moving forward, take a holistic approach
crowdsource best practices
restructure best practices to move the needle
deliver best practices
circle back to take a holistic approach
moving forward, move the needle

")
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void) {
int result;
char input;
time_t t;
srand((unsigned) time(&t));
int r3;
 takeaholisticapproach:
result = scanf("%c", &input);
if(0 == result) { r3 = -1; }
r3 = (int)input;
if(r3 < 0) goto movetheneedle;
printf("%c", r3);
goto takeaholisticapproach;
movetheneedle:
return 0;
}

NIL

Addendum

Two things occured to me after writing the above and watching a conference talk about compiling Scheme to C. First, I should write the output directly to disk instead of having to copy-paste, now that I have things mostly working. Second, there's no reason I couldn't invoke the compiler directly from Common Lisp as well. This lead me to using tcc (for fun) and I realized things were faster still. In practice, parsing strategic-communication, compiling it to C, writing it to disk, invoking the C compiler and running the resulting code takes about 0.012 seconds.

Timing details for fizzbuzz.
CL-USER> (time (run "align customer experience with HR
align revenue streams with HR
align core competencies with HR
align assets with Engineering, Legal, and Legal
revisit holistic approach
moving forward, make it pop
align key performance indicators with HR
align best practices with Sales and HR
align stakeholder engagement with Engineering, HR, and Marketing
deliver best practices
produce stakeholder engagement
produce assets
deliver assets
revisit scalability
moving forward, engage in deep dive
align best practices with R&D and R&D
align stakeholder engagement with Engineering, Engineering, and Sales
deliver best practices
produce stakeholder engagement
produce assets
deliver assets
revisit mindshare
moving forward, strive for success
pivot key performance indicators to mindshare
align stakeholder engagement with Finance and Manufacturing
pivot customer experience to minimize impact
align best practices with customer experience 
synergize best practices and stakeholder engagement
produce best practices
moving forward, minimize impact
align best practices with revenue streams  
synergize best practices and stakeholder engagement
deliver best practices
circle back to mindshare
moving forward, reduce cost basis
align stakeholder engagement with PR
differentiate best practices and stakeholder engagement
pivot best practices to make it pop
restructure best practices to scalability
circle back to reduce cost basis
moving forward, scalability
align best practices with core competencies
align stakeholder engagement with Marketing
moving forward, reimagined scalability
differentiate best practices and stakeholder engagement
pivot best practices to engage in deep dive
restructure best practices to strive for success
circle back to reimagined scalability
moving forward, seamlessly integrate
innovate customer experience 
align revenue streams with HR
circle back to harden cloud infrastructure
moving forward, mindshare
align best practices with Engineering and HR
produce best practices
moving forward, holistic approach
align key performance indicators with Engineering
innovate core competencies
innovate revenue streams
align stakeholder engagement with Engineering and HR
align best practices with revenue streams
differentiate best practices and stakeholder engagement
pivot best practices to seamlessly integrate
moving forward, harden cloud infrastructure
align best practices with customer experience 
differentiate best practices and stakeholder engagement
pivot best practices to call it a day
align best practices with core competencies
revisit reduce cost basis
moving forward, call it a day"))
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz
31
32
Fizz
34
Buzz
Fizz
37
38
Fizz
Buzz
41
Fizz
43
44
FizzBuzz
46
47
Fizz
49
Buzz
Fizz
52
53
Fizz
Buzz
56
Fizz
58
59
FizzBuzz
61
62
Fizz
64
Buzz
Fizz
67
68
Fizz
Buzz
71
Fizz
73
74
FizzBuzz
76
77
Fizz
79
Buzz
Fizz
82
83
Fizz
Buzz
86
Fizz
88
89
FizzBuzz
91
92
Fizz
94
Buzz
Fizz
97
98
Fizz
Evaluation took:
  0.011 seconds of real time
  0.006477 seconds of total run time (0.002487 user, 0.003990 system)
  54.55% CPU
  45 forms interpreted
  27,116,398 processor cycles
  990,160 bytes consed
    

Miscellaneous

I found using Lisp symbols to be generally nicer than Python enumerations. Similarly, cond is more appealing than if: else: blocks to me personally. I don't think my adaptation to the token object is very good, I slowly tweaked it from the original Python class to add a potential value field for things like numeric constants and registers. While this (obviously) worked, the sorts of checking required to dispatch on either the kind of token or value is tedious and not well motivated. Were I to maintain this any further I think that'd be a likely place to start refactoring.

While currently all of the example programs provided in the original implementation seem to produce identical output I think there are a few lingering bugs. Based on my reading of the README my parsing it a little too lax in the following cases:

Additionally, the boilerplate necessary for RAND (time.h and srand) isn't conditional on the actual use of the "random integer" operation. There isn't any reason it couldn't be conditionally included other than I haven' taken the time to do so.

Thoughts

This whole process has been a fun one; it is nice to work on such a well-defined problem. Additionally, it has kindled an interest in me to learn more about compilers. I'm currently considering buying Write Your Own Compiler while I browse through Crenshaw's Let's Build a Compiler.