[nolan@nprescott.com] $  cat weblog archive feed

The Forest and the Trees

2020-04-19

I woke up this morning with a vague but niggling idea that I had missed a more generalized problem in developing my ray tracer. It was going to eat at me all day unless I bothered finding out.

To take advantage of the array-oriented programming style of J I needed a single array of all the x, y coordinates on which to "draw" the ray lighting effects. I gave an example of one such array with "boxing" applied to demonstrate how the result was a self-indexed array:

   coordinates=: [: (] ,"0 |:) 2&# $ ] # i. NB. y is square array width

   <"1 coordinates 3
┌───┬───┬───┐
│0 0│0 1│0 2│
├───┼───┼───┤
│1 0│1 1│1 2│
├───┼───┼───┤
│2 0│2 1│2 2│
└───┴───┴───┘

I was already familiar with the catalogue verb ({) which can accomplish something very similar, but with the boxing applied directly:

   { (i.3) ; (i.3)
┌───┬───┬───┐
│0 0│0 1│0 2│
├───┼───┼───┤
│1 0│1 1│1 2│
├───┼───┼───┤
│2 0│2 1│2 2│
└───┴───┴───┘

I had avoided using catalogue because of how slow it ends up being for large arrays. I think a big part of the slow-down is a result of boxing (and maybe unboxing) which, as I understand it, represent indirection from what are otherwise contiguous memory access.

This can be demonstrated with the basic profiling verb timeAndSpace, which returns time taken in seconds and space used in bytes:

   timeAndSpace=: 6!:2 , 7!:2@]

   timeAndSpace '{ ;~ (i. 1200)'
0.102982 2.01115e8

   timeAndSpace '>{ ;~ (i. 1200)'
0.151136 2.34669e8

   timeAndSpace 'coordinates 1200'
0.022646 6.71102e7

In the case of just calling catalogue the operation takes more than 4 times as long as my own coordinates, and in order to match the array format requires unboxing (>) which takes longer still.

This was enough initially for me to be satisfied with my own implementation and I didn't bother thinking further until this morning. Stepping back a bit made me realize I'm poorly re-implementing a more limited subset of a Cartesian product (which is what catalogue is for). I accomplished this by created the 2 dimensional array:

   (2&# $ ] # i.) 3
0 0 0
1 1 1
2 2 2

and then appending (rank 0) the transposition of the same array:

   (] ,"0 |:) (2&# $ ] # i.) 3
0 0
0 1
0 2

1 0
1 1
1 2

2 0
2 1
2 2

Really this is a few more steps than necessary if you consider the (unacknowledged) goal of generating the cartesian product of a simple array. Instead it is possible to append over each atom of the same array, the nearly literal translation to J is:

   over=: /
   append=: ,
   eachAtom=: "0
   someArray=: i.3
   itself=: ~

   append eachAtom over itself someArray
0 0
0 1
0 2

1 0
1 1
1 2

2 0
2 1
2 2

A single verb to capture this can be collapsed down to just:

   indexArray=: ,"0/~
   
   timeAndSpace 'indexArray i. 1200'
0.015167 3.36382e7

Which is faster, uses less memory than either prior solution and is simpler to read. The performance improvement is even more apparent with larger arrays:

   open=: <
   catalogue=: {
   timeAndSpace 'open catalogue ;~ i.10000'
39.0662 1.38739e10

   timeAndSpace 'coordinates 10000'
4.0254 4.29497e9

   timeAndSpace 'indexArray i. 10000'
1.59542 2.14814e9

RTFM

I was generally feeling happy with this realization right up until the point where I opened the documentation for catalogue which notes:

Note: Table (/) has a similar action. It is the more usual primitive to use for this task.

0 1 (<@,"0)/ 7 8 9
+---+---+---+
|0 7|0 8|0 9|
+---+---+---+
|1 7|1 8|1 9|
+---+---+---+

Which is exactly what I've managed to rediscover here. Once again I find myself a little humbled by my own inability to, first, recognize the more general form of a problem and second, guess that it has already been solved. Oh well, at least I know a little bit more about the language and how to profile it.