Topological Sorting

2020-11-28

We're very nearly to December which means Advent of Code will be starting up again. I, of course, haven't finished up 2018's Advent of Code. I'm only on day 7 at this point, which means my projection of completing in 2021 isn't looking so good. Now seems as good a time as any to muddle through another puzzle!

The Puzzle

The challenge is to take a series of "instructions" and output the necessary order in which they must be completed. For example:

Step C must be finished before step A can begin.
Step C must be finished before step F can begin.
Step A must be finished before step B can begin.
Step A must be finished before step D can begin.
Step B must be finished before step E can begin.
Step D must be finished before step E can begin.
Step F must be finished before step E can begin.

Is intended to become: C A B D F E. This is basically a topological sort, with a small exception that equivalently "ready" steps should be done alphabetically.

Kahn's Algorithm

My only experience in topological sorting to date has been the coverage in The AWK Programming Language and, after today, what I read in CLRS. To be honest, the AWK solution was fine but seemed to lose some of the specifics of the algorithm, and I didn't relish the thought of writing depth first search. Instead I found Kahn's algorithm refreshingly simple:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edge

while S is not empty do
    remove a node n from S
    add n to L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S

if graph has edges then
    return error   (graph has at least one cycle)
else 
    return L   (a topologically sorted order)

It has been a while since I've written about one of these puzzles so I'll re-cover the boilerplate I seem to tweak for each puzzle, here are two functions to transform the textual input into a list of edges like (C A):

(defun parse-line (line)
  (list (read-from-string (subseq line 5 6))
        (read-from-string (subseq line 36 37))))

(defun read-lines (filename)
  "read lines as some lispy thing"
  (with-open-file (stream filename)
    (loop for line = (read-line stream nil)
          while line
          collect (parse-line line))))

With that out of the way I also need a way to deduplicate a list, which I'm sure I've written before for one of these puzzles but it can't hurt to write again:

(defun unique (l)
  "return list of unique elements in list"
  (let ((uniques (list)))
    (dolist (item l uniques)
      (unless (member item uniques :test 'equal)
        (push item uniques)))))

The intent is for it to work analogously to something like list(set(duplicates_list)) in Python.

It is possible it will come back to bite me, but I didn't bother to create a real graph structure, opting instead to munge the list of edges directly. There is some duplicated work but it seemed faster to just bang it out like this. I ended up sacrificing the cycle-detection since I knew there weren't any, and wrote the following to collect the vertexes without predecessors:

(defun unencumbered (edges)
  "vertexes with in-degree of zero"
  (loop for (predecessor successor) in edges
        collect predecessor into predecessors
        collect successor into successors
        finally (return (set-difference (unique predecessors) 
                                        (unique successors) 
                                        :test 'string=))))

set-difference returns a list of values from the first list argument that are not in the second, in this case predecessors that are not successors.

Sorting

With those pieces out of the way all that remains is a nearly direct translation of the description from Wikipedia. I think I've found a case where not having a real graph data structure starts to hurt too. As a vertex is added to the sorted list I sweep through the list of edges and remove any that have that node as a predecessor. The intent is to indicate that those nodes are "ready", which almost works, except for the case of the last node(s) in the graph. In that case, there are no more edges and the final nodes are omitted. I worked around this by checking for "orphans" once the remaining edges list is exhausted.

(defun topological-sort (edges)
  (do ((L (list))
       (S (sort (unencumbered edges) #'string<) (sort S #'string<))
       (remaining-edges edges))
      ((null S) (reverse L))
    (push (pop S) L)
    (setf remaining-edges (remove (first L) remaining-edges :test #'string= :key #'first))
    (if (null remaining-edges)
        (dolist (orphan (set-difference (unique (mapcar #'second edges)) L :test 'string=))
          (push orphan S)))
    (dolist (edge (unique (unencumbered remaining-edges)))
      (unless (member edge S :test #'string=)
        (push edge S)))))

More Fun

With the list of edges already at hand, it occurred to me that I could create an image of the graph easily with graphviz, so I wrote the following:

(defun print-dot (edges)
  (format t "digraph graphname {~%")
  (loop for (p s) in edges
        do (format t "~s -> ~s;~%" p s))
  (format t "}~%"))

The results are pretty gnarly, but I think they're fun: directed graph of the puzzle instructions