[nolan@nprescott.com] $>  cat blog archive feed

Advent of Code, Day One

2018-12-01

Another year, another attempt at tackling the annual Advent of Code. I decided last year that solving the puzzles wasn't necessarily the fun part for me and instead I like it as a way to practice programming, so this year I think I'll try doing things in Common Lisp.

Part One

The first puzzle is definitely much meant to ease you into the challenge, the task:

A value like +6 means the current frequency increases by 6; a value like -3 means the current frequency decreases by 3.

Starting with a frequency of zero, what is the resulting frequency after all of the changes in frequency have been applied?

It so happens that the lisp reader is perfectly happy to consider +3 and -6 valid integers, so no special handling is necessary:

(with-open-file (stream "1a.txt")
  (loop for line = (read stream nil)
     while line
     sum line))

Common Lisp's loop macro can actually do some aggregations on its own, in this case, taking the sum of all the values. Without requiring even some level of pre-processing this one felt like cheating.

Part Two

The challenge is basically:

You notice that the [list] repeats over and over... you need to find the first [sum] it reaches twice

The Approach

I think there is probably some algorithmic trick to this that I haven't realized, but with my above reading of the problem I settled on the following approach:

In retrospect, this is a very naive approach but in the interest of getting something working before the analysis paralysis sets in, the one I took.

The first step, to create a circular list from the input data is pleasantly simple in Common Lisp. I basically take the first solution and replace sum with collect (to build a list instead of reducing it) and then set the tail position of the last thing in the list to itself:

(defun dumb-read (filename)
  "read lines as some lisp-y thing into a list"
  (with-open-file (stream filename)
    (loop for line = (read stream nil)
       while line
       collect line)))

(defun circular (some-list)
  (setf (cdr (last some-list)) some-list))

So, for example if the input list was 1, 2, 3, 4:

(circular (list 1 2 3 4))

would turn into:

There is one trick to this, Lisp needs to be told that "circles" should be treated specially, so that it doesn't try to render the whole list and exhaust the heap (ask me how I know!):

(setq *print-circle* t)

With that accomplished, I wrote a quick and dirty helper function to check for duplicates in a list (assuming the list is sorted):

(defun find-dupe (some-list)
  (cond ((equal some-list nil) nil)
        ((equal (car some-list) (cadr some-list)) (car some-list))
        (t (find-dupe (cdr some-list)))))

I know this is not great, and I should probably read the Hyperspec to see if this function exists, but I'm keeping forward momentum here! The next step is to loop over the data and track a list of "seen values" and their sum per loop:

(defun find-duplicate-sums (some-list)
  (let ((the-list (circular some-list)))
    (loop while the-list
       for so-far = '() then (push (pop the-list) so-far)
       for running-totals = '() then (push (apply #'+ so-far) running-totals)
       do
         (let ((maybe-dupe (find-dupe (sort running-totals #'<))))
           (if maybe-dupe (return maybe-dupe))))))

An eagle-eyed reader might see that I'm sorting the list of running totals inside of a loop, or building what is basically the list of seen values in reverse, or summing the whole list of seen values each iteration. But what you might not see is that this actually works!

It's a little wild to me that my first approach in a new language didn't blow up in my face (well, after I fixed the "don't print circular lists" issue). The only real problem is that it is dog slow. For my input data the above function takes about 500 seconds.

Optimization

Measurement

The real key to optimization is always, always measurement. Especially in my case, I'm writing in a language I don't really know well, I wouldn't want to hazard a guess at either performance or bottlenecks. So that means profiling, SBCL ships with statistical profiler that looked like plenty for my uses, it's just a matter of turning it on.

* (require :sb-sprof)
("SB-SPROF")
* (sb-sprof:start-profiling)
* (find-duplicate-sums (dumb-read "1a.txt"))
Profiler sample vector full (22,684 traces / approximately 499,999 samples), doubling the size
Profiler sample vector full (44,761 traces / approximately 999,998 samples), doubling the size
* (sb-sprof:report)

The output is pretty verbose, but the gist of it is:

           Self        Total        Cumul
  Nr  Count     %  Count     %  Count     %    Calls  Function
------------------------------------------------------------------------
   1   8633  18.5   8633  18.5   8633  18.5        -  <
   2   8608  18.4  16960  36.3  17241  36.9        -  SB-IMPL::MERGE-LISTS*
   3   8364  17.9  37024  79.2  25605  54.8        -  (LABELS SB-IMPL::RECUR :IN SB-IMPL::STABLE-SORT-LIST)
   4   4456   9.5   5501  11.8  30061  64.3        -  FIND-DUPE
   5   3776   8.1   3776   8.1  33837  72.4        -  LENGTH
   6   3085   6.6   3085   6.6  36922  79.0        -  SUM
   7   2943   6.3   3035   6.5  39865  85.3        -  SB-THREAD::%%WAIT-FOR
   8   2361   5.1   2361   5.1  42226  90.4        -  IDENTITY
   9   1544   3.3   1544   3.3  43770  93.7        -  +
  10   1519   3.3   1519   3.3  45289  96.9        -  SB-VM::ALLOC-TRAMP
  11    836   1.8    836   1.8  46125  98.7        -  (LABELS SB-IMPL::EQUAL-AUX :IN EQUAL)
  12    434   0.9    434   0.9  46559  99.6        -  EQUAL
  13     69   0.1     69   0.1  46628  99.8        -  "foreign function __syscall"
  14     46   0.1  32482  69.5  46674  99.9        -  SB-IMPL::STABLE-SORT-LIST
  15     31   0.1  43697  93.5  46705  99.9        -  FIND-DUPLICATE-SUMS

A Couple Fixes

Thanks to how easy it is to jump around to even internal definitions I can convince myself that the real performance hit is from the call to sort. Since sort is only necessary because my lousy "check for duplicates" helper function requires sorted input, it seems like a good point to draw back to. I've got the phrase "Efficiency with Algorithms, Performance with Data Structures" stuck in my head after watching this talk, so why not reconsider the data structure.

If, instead of rolling my own "check duplicate" function, I just treat the running-totals as a kind of set, the inner part of the loop can become:

(defun find-duplicate-sums (some-list)
  (let ((the-list (circular some-list)))
    (loop while the-list
       for so-far = '() then (push (pop the-list) so-far)
       for running-totals = '() then (push (apply #'+ so-far) running-totals)
       do
         (if (member (car running-totals) (cdr running-totals))
             (return (car running-totals))))))

This alone gets us from 500 seconds down to 80 seconds. Still not very good, but better.

It occurs to me that "pushing" the sum of all the values "so far" onto the list of "running totals" is a bit more work than really necessary because the previously computed value is the most recent sum! If instead I push only the sum of the most recent two values I have to concede that the initial state of the loop variables is a list of one element, zero. In this way it is more like an accumulator:

(defun find-duplicate-sums (some-list)
  (let ((the-list (circular some-list)))
    (loop while the-list
       for so-far = '(0) then (push (pop the-list) so-far)
       for running-totals = '(0) then (push (+ (car so-far) (car running-totals)) running-totals)
       do
         (if (member (car running-totals) (cdr running-totals))
             (return (car running-totals))))))

The result is now down to 25 seconds — better but not great. The profiler at this point reports that most of the time is spent in the member function:

           Self        Total        Cumul
  Nr  Count     %  Count     %  Count     %    Calls  Function
------------------------------------------------------------------------
   1   1677  64.5   1677  64.5   1677  64.5        -  SB-KERNEL:%MEMBER
   2    591  22.7    591  22.7   2268  87.3        -  SB-VM::ALLOC-TRAMP
   3    301  11.6    307  11.8   2569  98.8        -  SB-THREAD::%%WAIT-FOR
   4     24   0.9   2292  88.2   2593  99.8        -  FIND-DUPLICATE-SUMS

I thought I might claw back some performance by tracking only the "next" value instead of the entire list "so far", but it turns out to not make much difference (although it does read better):

(defun find-duplicate-sums (some-list)
  (let ((the-list (circular some-list)))
    (loop while the-list
       for next = 0 then (pop the-list)
       for running-totals = (list 0) then (push (+ next (first running-totals)) running-totals)
       do
         (if (member (first running-totals) (rest running-totals))
                 (return (first running-totals))))))

Thoughts

I think being reduced to optimizing the calls for membership tests indicates I've reached the logical conclusion of this particular puzzle. I might mull over a bit more what sorts of algorithmic approaches I might take instead, but for a first solution getting a 20x speed-up in a new language doesn't feel too bad. I'll just try not to think about the whole number (25 seconds).

Addendum

Running down the clock on this one before the next problem is released and I gave a "real" hashing datatype a shot. So while the call to member is correct, it is slow because it ends up chasing down the list of pointers inside the loop. Switching things out to use a hash-map gets constant time lookups and finally brings the time down to 0.03 seconds. I guess performance with data structures is no joke.

(defun find-duplicate-sums (some-list)
  (let ((the-list (circular some-list))
        (totals-map (make-hash-table)))
    (loop while the-list
       for next = 0 then (pop the-list)
       for previous = 0 then (+ previous next)
       do
         (if (gethash previous totals-map)
             (return previous)
             (setf (gethash previous totals-map) previous)))))
[nolan@nprescott.com] $> █