Another year, another attempt at tackling the annual Advent of Code. I decided last year that solving the puzzles wasn't necessarily the fun part for me and instead I like it as a way to practice programming, so this year I think I'll try doing things in Common Lisp.
The first puzzle is definitely much meant to ease you into the challenge, the task:
A value like +6 means the current frequency increases by 6; a value like -3 means the current frequency decreases by 3.
Starting with a frequency of zero, what is the resulting frequency after all of the changes in frequency have been applied?
It so happens that the lisp reader is perfectly happy to consider +3
and -6
valid integers, so no special handling is necessary:
(with-open-file (stream "1a.txt")
(loop for line = (read stream nil)
while line
sum line))
Common Lisp's loop
macro can actually do some aggregations on its
own, in this case, taking the sum of all the values. Without requiring
even some level of pre-processing this one felt like cheating.
The challenge is basically:
You notice that the [list] repeats over and over... you need to find the first [sum] it reaches twice
I think there is probably some algorithmic trick to this that I haven't realized, but with my above reading of the problem I settled on the following approach:
In retrospect, this is a very naive approach but in the interest of getting something working before the analysis paralysis sets in, the one I took.
The first step, to create a circular list from the input data is
pleasantly simple in Common Lisp. I basically take the first solution
and replace sum
with collect
(to build a list instead of reducing
it) and then set the tail position of the last thing in the list to
itself:
(defun dumb-read (filename)
"read lines as some lisp-y thing into a list"
(with-open-file (stream filename)
(loop for line = (read stream nil)
while line
collect line)))
(defun circular (some-list)
(setf (cdr (last some-list)) some-list))
So, for example if the input list was 1, 2, 3, 4:
(circular (list 1 2 3 4))
would turn into:
There is one trick to this, Lisp needs to be told that "circles" should be treated specially, so that it doesn't try to render the whole list and exhaust the heap (ask me how I know!):
(setq *print-circle* t)
With that accomplished, I wrote a quick and dirty helper function to check for duplicates in a list (assuming the list is sorted):
(defun find-dupe (some-list)
(cond ((equal some-list nil) nil)
((equal (car some-list) (cadr some-list)) (car some-list))
(t (find-dupe (cdr some-list)))))
I know this is not great, and I should probably read the Hyperspec to see if this function exists, but I'm keeping forward momentum here! The next step is to loop over the data and track a list of "seen values" and their sum per loop:
(defun find-duplicate-sums (some-list)
(let ((the-list (circular some-list)))
(loop while the-list
for so-far = '() then (push (pop the-list) so-far)
for running-totals = '() then (push (apply #'+ so-far) running-totals)
do
(let ((maybe-dupe (find-dupe (sort running-totals #'<))))
(if maybe-dupe (return maybe-dupe))))))
An eagle-eyed reader might see that I'm sorting the list of running totals inside of a loop, or building what is basically the list of seen values in reverse, or summing the whole list of seen values each iteration. But what you might not see is that this actually works!
It's a little wild to me that my first approach in a new language didn't blow up in my face (well, after I fixed the "don't print circular lists" issue). The only real problem is that it is dog slow. For my input data the above function takes about 500 seconds.
The real key to optimization is always, always measurement. Especially in my case, I'm writing in a language I don't really know well, I wouldn't want to hazard a guess at either performance or bottlenecks. So that means profiling, SBCL ships with statistical profiler that looked like plenty for my uses, it's just a matter of turning it on.
* (require :sb-sprof)
("SB-SPROF")
* (sb-sprof:start-profiling)
* (find-duplicate-sums (dumb-read "1a.txt"))
Profiler sample vector full (22,684 traces / approximately 499,999 samples), doubling the size
Profiler sample vector full (44,761 traces / approximately 999,998 samples), doubling the size
* (sb-sprof:report)
The output is pretty verbose, but the gist of it is:
Self Total Cumul
Nr Count % Count % Count % Calls Function
------------------------------------------------------------------------
1 8633 18.5 8633 18.5 8633 18.5 - <
2 8608 18.4 16960 36.3 17241 36.9 - SB-IMPL::MERGE-LISTS*
3 8364 17.9 37024 79.2 25605 54.8 - (LABELS SB-IMPL::RECUR :IN SB-IMPL::STABLE-SORT-LIST)
4 4456 9.5 5501 11.8 30061 64.3 - FIND-DUPE
5 3776 8.1 3776 8.1 33837 72.4 - LENGTH
6 3085 6.6 3085 6.6 36922 79.0 - SUM
7 2943 6.3 3035 6.5 39865 85.3 - SB-THREAD::%%WAIT-FOR
8 2361 5.1 2361 5.1 42226 90.4 - IDENTITY
9 1544 3.3 1544 3.3 43770 93.7 - +
10 1519 3.3 1519 3.3 45289 96.9 - SB-VM::ALLOC-TRAMP
11 836 1.8 836 1.8 46125 98.7 - (LABELS SB-IMPL::EQUAL-AUX :IN EQUAL)
12 434 0.9 434 0.9 46559 99.6 - EQUAL
13 69 0.1 69 0.1 46628 99.8 - "foreign function __syscall"
14 46 0.1 32482 69.5 46674 99.9 - SB-IMPL::STABLE-SORT-LIST
15 31 0.1 43697 93.5 46705 99.9 - FIND-DUPLICATE-SUMS
Thanks to how easy it is to jump around to even internal definitions I
can convince myself that the real performance hit is from the call to
sort
. Since sort is only necessary because my lousy "check for
duplicates" helper function requires sorted input, it seems like a
good point to draw back to. I've got the phrase "Efficiency with
Algorithms, Performance with Data Structures" stuck in my head after
watching this talk, so
why not reconsider the data structure.
If, instead of rolling my own "check duplicate" function, I just treat
the running-totals
as a kind of set, the inner part of the loop can
become:
(defun find-duplicate-sums (some-list)
(let ((the-list (circular some-list)))
(loop while the-list
for so-far = '() then (push (pop the-list) so-far)
for running-totals = '() then (push (apply #'+ so-far) running-totals)
do
(if (member (car running-totals) (cdr running-totals))
(return (car running-totals))))))
This alone gets us from 500 seconds down to 80 seconds. Still not very good, but better.
It occurs to me that "pushing" the sum of all the values "so far" onto the list of "running totals" is a bit more work than really necessary because the previously computed value is the most recent sum! If instead I push only the sum of the most recent two values I have to concede that the initial state of the loop variables is a list of one element, zero. In this way it is more like an accumulator:
(defun find-duplicate-sums (some-list)
(let ((the-list (circular some-list)))
(loop while the-list
for so-far = '(0) then (push (pop the-list) so-far)
for running-totals = '(0) then (push (+ (car so-far) (car running-totals)) running-totals)
do
(if (member (car running-totals) (cdr running-totals))
(return (car running-totals))))))
The result is now down to 25 seconds — better but not great. The
profiler at this point reports that most of the time is spent in the
member
function:
Self Total Cumul
Nr Count % Count % Count % Calls Function
------------------------------------------------------------------------
1 1677 64.5 1677 64.5 1677 64.5 - SB-KERNEL:%MEMBER
2 591 22.7 591 22.7 2268 87.3 - SB-VM::ALLOC-TRAMP
3 301 11.6 307 11.8 2569 98.8 - SB-THREAD::%%WAIT-FOR
4 24 0.9 2292 88.2 2593 99.8 - FIND-DUPLICATE-SUMS
I thought I might claw back some performance by tracking only the "next" value instead of the entire list "so far", but it turns out to not make much difference (although it does read better):
(defun find-duplicate-sums (some-list)
(let ((the-list (circular some-list)))
(loop while the-list
for next = 0 then (pop the-list)
for running-totals = (list 0) then (push (+ next (first running-totals)) running-totals)
do
(if (member (first running-totals) (rest running-totals))
(return (first running-totals))))))
I think being reduced to optimizing the calls for membership tests indicates I've reached the logical conclusion of this particular puzzle. I might mull over a bit more what sorts of algorithmic approaches I might take instead, but for a first solution getting a 20x speed-up in a new language doesn't feel too bad. I'll just try not to think about the whole number (25 seconds).
Running down the clock on this one before the next problem is released
and I gave a "real" hashing datatype a shot. So while the call to
member
is correct, it is slow because it ends up chasing down the
list of pointers inside the loop. Switching things out to use a
hash-map gets constant time lookups and finally brings the time down
to 0.03 seconds. I guess performance with data structures is no
joke.
(defun find-duplicate-sums (some-list)
(let ((the-list (circular some-list))
(totals-map (make-hash-table)))
(loop while the-list
for next = 0 then (pop the-list)
for previous = 0 then (+ previous next)
do
(if (gethash previous totals-map)
(return previous)
(setf (gethash previous totals-map) previous)))))