I have been using pytest for a few different projects recently and while I've found it very usable, I have also found it to be very complicated.
In attempting to understand how a specific interaction was working I ended up digging into the source and what I found was surprising:
Language | files | blank | comment | code |
---|---|---|---|---|
Python | 132 | 7082 | 12081 | 26760 |
Twenty-six thousand lines of code in order to — write tests? I'm not entirely naive, I understand many of the conveniences that pytest provides, but I have to wonder if we aren't causing ourselves the very problems we are trying to solve. Like, "I have to import 565 lines of code in order to format the output of my test run in the same manner as Java's JUnit" — just, why?
Inevitably it is because there is some second or third order tool that can consume the output without having to write a parser ourselves. But how complicated would that be, if the intent was to only solve our own problem? How much would you actually need if you had to rewrite these things yourself?
While it has been a while since I used Forth for much, one thing I remember fondly is the test harness. The actual test "framework" is about 50 lines of very sparse code. I don't want to give the wrong impression, the two projects are not close to equivalent. I have no doubt that pytest has infinitely more in it and is capable of doing just about anything and makes it look easy, at least on the surface. While it might be difficult to understand the Forth test harness without some familiarity with Forth, it can at least be informative to browse the tests that accompany in the repository, which include tests for:
All of which is testable with a 50 line test framework. While this is surely one aspect of a language that maintains what is almost an allergy to "over-engineering", I think it is a hint that things don't have to be this hard. Is it that Forth programs are more testable and a reflection of the language? Or is it that by making a one size fits all framework, the Python test harness has grown inordinately large? I'm not actually sure, I would guess it is a bit of both.
T{ 1 2 3 * + -> 7 }T
There are three words to understand here:
T{
this can be read as "test begins", it is actually pure
syntactic sugar and does nothing
->
can be read as "assert stack", it takes "input" from the left
hand side and allows for a comparison against the right hand
side. It does this by recording the depth of the stack and saving
off the contents.
}T
can be read "test concludes", it first compares the depth of
the left hand side and right hand side of the stack, then, if the
depths match it compares each item. It also handles reporting
failures and clearing the stack after a test.
In this case, the left hand side stack has depth one and a value of 7. It is compared to a right hand side depth of one and value 7.
And as you might expect, error reporting is pretty sparse, but because of how focused the tests are they're workable:
T{ 1 2 3 * + -> 999 }T
INCORRECT RESULT: T{ 1 2 3 * + -> 999 }T ok
T{ 1 2 3 * + -> 1 2 3 }T
WRONG NUMBER OF RESULTS: T{ 1 2 3 * + -> 1 2 3 }T ok
The above might be a cute example of how a ridiculously simple "test" can be done in Forth, but as a demonstration of it's utility I thought to apply it to another problem I've written about before: a linear congruence generator
As a refresher, I was matching the following function signature:
unsigned char *LCG(unsigned char *data, int dataLength, unsigned char initialValue)
HEX
A5 CONSTANT MULTIPLICATIVE
C9 CONSTANT ADDITIVE
100 CONSTANT MODULUS
VARIABLE value
VARIABLE length
VARIABLE output
: lcg ( n -- n )
MULTIPLICATIVE *
ADDITIVE +
MODULUS MOD ;
: generator ( data length value -- addr length )
value !
length !
length @ ALLOCATE throw output !
0 ?DO
value @ lcg
DUP value !
over I + @ XOR
output I + !
LOOP DROP
output length @ ;
From there, I can write several tests, first to verify the lcg
word
works for single values, and then based on the original prompt to
assert that the generator works in both directions, and a final test
to demonstrate the negative case:
data | dataLength | initialValue | result |
---|---|---|---|
apple | 5 | 55 | \xF3\x93\x68\x2D\xCB |
\xF3\x93\x68\x2D\xCB |
5 | 55 | apple |
TESTING lcg works for single values
T{ 02 lcg -> 13 }T ok
TESTING generator in both forward and reverse
T{ s" apple" 5 55
generator
s\" \xF3\x93\x68\x2D\xCB" str= -> true
}T ok
T{ s\" \xF3\x93\x68\x2D\xCB" 5 55
generator
s" apple" str= -> true
}T ok
TESTING a negative case, string does not match generated value
T{ s\" \xF3\x93\x68\x2D\xCB" 5 55
generator
s" foo bar" str= -> false
}T ok
In Python, all sorts of design flaws can be masked under the guise of testability provided by a framework that allows you to mock out the dependencies of your dependencies, or monkey-patch a library call at run-time.
While Python purports to support the idea that "There should be one-- and preferably only one --obvious way to do it." I have found Forth much more hard-lined in what is an isn't supported by libraries or the core language. Rather than accommodating tight coupling and masking it through a framework, the tests are intentionally simple because the interfaces are simple. Forth's stack-based programming forces a very particular approach to problem-solving that tends toward doing the obvious thing.