I've spent some time interviewing recently, the result of which is an interesting assortment of questions and problems. Here's one I had not seen before, a Linear Congruence Generator.
I did a bit of research to find that linear congruence generators are often
used as pseudo-random number
generators. This page
explains in greater depth, but it seems LCGs are common enough partly because
of how easy they are to implement. The example given below is not great in
terms of randomness because it is only 8-bit. The linked page gives some
context for challenges in scaling to 32-bit lengths and handling integer
overflows, the same challenges apply in Forth, and I will be ignoring them to
simplify the syntax (no use of CELLS
or 2VARIABLE
etc.).
A common technique for obfuscating data is to use exclusive‐or (xor) and some key; it is inexpensive and symmetric. A problem occurs when this is used on file formats like portable executable where there are lots of nulls since xor'ing nulls with your key ends up writing your key out.
A slightly more complex algorithm is to implement a linear congruence generator (LCG) to generate a key stream which will be xor'ed with the data. The generic form of an LCG is:
Value = (M * Value + A) % n
where M is the multiplicative constant, A is the additive constant, and n is the modulus. These three values are the parameters of the LCG.
For this problem I will be using an 8‐bit LCG (i.e. n = 256), with M = 0xa5
and
A = 0xc9
. The LCG is initialized with a value and stepped to produce the key
stream. The next byte of key is read from the LCG each step.
For example, if the initial value of the LCG is 0x02
, then the next value,
which would be the first key byte, would be:
(0xa5 * 0x02 + 0xc9) % 256 = 0x13.
The function should adhere to the following signature:
unsigned char *LCG(unsigned char *data, int dataLength, unsigned char initialValue)
Example Tests:
data | dataLength | initialValue | result |
---|---|---|---|
apple | 5 | 0x55 | \xF3\x93\x68\x2D\xCB |
\xF3\x93\x68\x2D\xCB |
5 | 0x55 | apple |
The first thing I notice, having read Starting Forth and reading Thinking Forth is that it makes no sense to mix two number bases unnecessarily. There is no value in keeping a single base-10 number (256) simply because it is a common expression for 8-bit computing. So I'll go ahead and change that to hex, where it is the equally friendly 100.
With that out of the way, the whole operation is simply:
HEX ( change base ) ok
A5 02 * ok
C9 + ok
100 MOD ( the result is now on the stack, I'll just pop it to verify ) ok
. 13 ok
Huh, well that was pretty simple. I've broken it down in to a single arithmetic step per line, with the final line popping the stack to check the result (which matches the example given). The next step should probably be to create a logical word for this, along with a stack comment.
HEX
: lcg ( n -- n )
A5 *
C9 +
100 MOD ;
This doesn't yet match the requested function signature (which I am obviously taking some liberties with, using Forth):
unsigned char *LCG(unsigned char *data, int dataLength, unsigned char initialValue)
I will be using a Forth-ism and place the starting address and length of the
result on the stack, as a means of returning a "string". Forth doesn't support
strings as first-class objects and treats them more like C (arrays). The
easiest way to then check the result is with the words type
or dump
, type
is best with text strings, dump is more useful for the hex-encoded reverse
operation.
VARIABLE value
VARIABLE length
VARIABLE output
: stepper ( data length value -- addr length )
value !
length !
length @ ALLOCATE throw output !
0 ?DO
value @ lcg
DUP value !
over I + @ XOR
output I + !
LOOP DROP
output length @ ;
The above code correctly handles allocating memory for the output string, but I
haven't bothered to deal with free
ing on completion. I think the idea is
pretty straight-forward, simply check for an existing output and free
before
re-allocating or alternatively, allocate a dummy value, possibly a zero-length
and re-size within stepper
.
s" apple" 5 55 ok
stepper ok
.s <2> 1093834C8 5 ok
dump
1093834C8: F3 93 68 2D CB - ..h-.
ok
s\" \xF3\x93\x68\x2D\xCB" 5 55 ok
stepper ok
.s <2> 1093834C8 5 ok
dump
1093834C8: 61 70 70 6C 65 - apple
ok
The one bit that tripped me up at first was Gforth's syntax for escaping hex
characters within a string syntax. Rather than the string word s"
I had to
find s\"
which allows for the hex escape character \x
. I am using dump
rather than type
in the example above because type
doesn't play nicely with
trying to print the hex F3 93 68 2D CB
.
This was a fun way to dig into Forth a bit after reading about it in books. The syntax is jarring at first but is a low barrier to entry with the language. The bigger barrier is getting into a frame of mind to think about memory addresses directly and iterating over them without creating unnecessary temporary variables. I am not sure I've got the hang of that yet, my above code does a bit of stack juggling, which isn't terrible but I think I could improve it by factoring it out further.
As far as pseudo-random number generators are concerned, I don't think I deeply understand the math involved. I have seen Mersenne Twisters used as a popular means of PRNG but the math involved is even more opaque than the LCG. I think I might need to bone-up on the necessary maths background before gaining further insight on this end.