← The notebook

Concurrent state evolution

You can call this my origin story as a developer, so let me tell it in order. And no — even though I am a Doctor (a PhD), the origin story did not turn me into a villain.

The bug that started it

My first assignment at a new job was a concurrency problem. There was a replay engine — it consumed a stream of recorded coordinates and walked a vehicle marker along a route on a map, pausing, resuming, interpolating between points — and it misbehaved in a way that was hard to pin down. It was driven by a go-loop parked on a channel of coordinates: for each one it had to calculate, then move the marker, then resolve to the next. Those three steps were never cleanly sequenced, so they raced.

A vehicle is a state machine

By coincidence I was reading, at the same time, one of those classic papers — Guy Steele's Lambda: The Ultimate GOTO — on how a loop is really just tail recursion wearing a costume, and how a bundle of mutually tail-recursive functions is just a state machine.

That reframing was already sitting in my head when I went back to the broken loop, and the two things clicked together: a vehicle replaying a route is a state machine. :stopped, :running, :paused, :resumed, and so on. Mutual recursion is the elegant way to write those transitions — each state decides the next.

:stopped:running:paused[:running :frames]:resumed playadvancesegment donereach :endresumecontinue
The replay engine as a state machine. Every box is a :status; the nested [:running :frames] is a sub-state living inside :running.

So I decided to hold the state in an atom. It wasn't a considered design at first — an atom just seemed like the obvious place for something the outside world might need to poke at, like a user hitting "pause" halfway down the route. The atom held the state, and a multimethod advanced it. The :status is the dispatch value.

Clojure — the decider

(defmulti resolve-state (fn [state-atom] (:status @state-atom)))

;; running: which recorded point do we head for next?
(defmethod resolve-state :running [state-atom]
  (let [next-position (next-position-index state-atom)]
    (if (= next-position :end)
      (swap! state-atom assoc :command :pause)
      (swap! state-atom merge {:command :advance :position-index next-position}))
    (execute-async #(execute-action state-atom))))

;; running between two points: take one step of the slide toward the target
(defmethod resolve-state [:running :frames] [state-atom]
  (let [{:keys [distance-elapsed start end]} @state-atom
        step (get speeds @(rf/subscribe [:replay/playback-speed]) :1x)]
    (if (zero? distance-elapsed)
      (let [line (line-string [start end])]                ; the segment to slide along
        (swap! state-atom assoc :line line :distance (length line) :distance-elapsed step))
      (swap! state-atom update :distance-elapsed + step))
    (swap! state-atom assoc :command [:advance :frames])
    (execute-async #(execute-action state-atom))))

(defmethod resolve-state :stopped [state-atom]
  (swap! state-atom assoc :command :stop)
  (execute-async #(execute-action state-atom)))

(defmethod resolve-state :paused [state-atom]              ; stash what we were doing
  (swap! state-atom rename-keys {:command :prev-command})
  (swap! state-atom assoc :command :pause)
  (execute-async #(execute-action state-atom)))

(defmethod resolve-state :resumed [state-atom]             ; and pick it back up
  (swap! state-atom rename-keys {:prev-command :command})
  (execute-async #(execute-action state-atom)))

A list lets the state be more granular

TL;DR — a keyword is one flat label; a list nests a sub-state inside it.

The status doesn't have to be a single keyword — it can be a short list, and that is more useful than it sounds. Most of the time the vehicle is simply :running. But "running" quietly hides a finer activity: between any two recorded GPS points the marker doesn't jump, it slides across the gap one animation frame at a time. That sliding is a phase inside running — not a separate mode the way :paused is. I could have minted a new top-level status, :running-between-frames, but then every part of the code that only wants to know "are we playing or not?" would suddenly have to learn about it too.

So the status becomes a list instead: [:running :frames]. The first element is the big-picture mode — we're running — and the second narrows it to the exact phase, the frame-by-frame slide. Think of it like an address: :running is the city, :frames is the street. Because the multimethod dispatches on the whole value, I can write one handler aimed precisely at [:running :frames], while any code that only cares about the city keeps reading just the first element and never notices the street exists. That is how you get one state nested inside another without the rest of the system having to account for it.

Decide, then do

resolve-state never touches the map or the data — it only decides. It has no side effects; it records that decision — a :command — in the atom. A second multimethod, execute-action, then dispatches on that command, performs the actual effect, and hands control back to resolve-state:

Clojure — the effects

(defmulti execute-action (fn [state-atom & _] (:command @state-atom)))

;; advance: grab the next pair of route coordinates, drop into frame mode
(defmethod execute-action :advance [state-atom]
  (let [[start end] @(rf/subscribe [:replay/route-coordinates])]
    (rf/dispatch-sync [:replay/vehicle-position start])
    (swap! state-atom assoc
           :start start
           :end end
           :distance-elapsed 0
           :position start
           :status [:running :frames]))
  (resolve-state state-atom))                ; bounce back to deciding

;; advance one frame: slide the marker a step along the current segment
(defmethod execute-action [:advance :frames] [state-atom]
  (let [{:keys [line distance distance-elapsed]} @state-atom]
    (if (< distance-elapsed distance)
      (let [position (along line distance-elapsed)]
        (rf/dispatch-sync [:replay/vehicle-position position])
        (swap! state-atom assoc :position position :status [:running :frames]))
      (swap! state-atom assoc :status :running)))   ; segment finished, back to running
  (resolve-state state-atom))

(defmethod execute-action :pause [state-atom]
  (swap! state-atom assoc :status :paused))

The two roles, in plain pseudocode

# Two roles bouncing off each other — no language required.

decide(state):                 # pure: choose the next command, touch nothing
    state.command = transition_for(state.status)
    trampoline(do, state)      # hop — schedule the effect, don't call it directly

do(state):                     # effectful: perform the command, then loop back
    perform(state.command)     # move the marker, persist, call an API…
    trampoline(decide, state)  # name the next, let the trampoline bounce

So the two multimethods trampoline off each other: decide, effect, decide, effect. A state never passes its successor (the next state) as an argument — it names it by writing :status / :command into the atom, and the bounce runs the next one. The atom is doing less than it looks here: it is just a convenient place to leave that name. With a little design the state could ride along as an argument instead — which is exactly what the channel version is about to do. In this first version the bounce was a setTimeout (an execute-async that also yields to pending map zooms), not a channel. That setTimeout is the trampoline. Hold that thought — it is the one piece Redux is about to replace.

The channel replaces the setTimeout

Then I remembered where I had seen this skeleton before: Redux in ClojureScript with Rum. Its whole argument is one sentence — one place for state, one place to change it — and it gets there with the exact machinery I was already reaching for: a go-loop draining a channel, and a multimethod standing in for the reducer.

(go-loop []
  (when-let [[type data] (<! actions)]
    (swap! state transform data type)
    (recur)))

That was the click. My replay engine already named its successor instead of calling it — that was the whole point of writing :command into the atom and bouncing. What it lacked was a good trampoline; the setTimeout worked but left the transitions scattered across two multimethods and the event loop. Redux showed the better bounce: a channel and one go-loop draining it. Swap the setTimeout for the channel and you get what I have been calling, privately, concurrent state evolution — and unlike the loop I was hired to fix, it just works.

The payload rides with the name

The channel is the trampoline. A state still never calls its successor; it puts a pair on the channel — the name of the next state and a payload — and a single loop bounces. The payload is the part the first version smuggled through the shared atom: now it travels in the open, as the data the next step needs to do its work. Every transition becomes a self-contained [action data] value, which is what makes the whole walk inspectable — you can read off, from the channel alone, what is happening and what it is happening to. In the latest version of this pattern — an AI pipeline that extracts structured data from emails — the states are no longer vehicle modes but steps in an effectful process, yet the skeleton is identical:

Clojure — the email pipeline

(defmulti evolve-flow (fn [action _data _config _ch] action))

;; persist the raw email, then hand the saved id on to the next step
(defmethod evolve-flow ::persist-email
  [_ email config ch]
  (let [saved (p/create-email! (:db config) email)]
    (dispatch-action ch ::extract-order-data {:email-id (:id saved)})))   ; next state + payload

;; pull structured order fields out of the email body with the assistant
(defmethod evolve-flow ::extract-order-data
  [_ {:keys [email-id]} config ch]
  (let [order (.processMessage assistant (load-body email-id))]
    (dispatch-action ch ::persist-order-details {:email-id email-id :order order})))

(go-loop []
  (when-let [[action data] (<! ch)]
    (try
      (evolve-flow action data config ch)
      (catch Throwable t
        (dispatch-error! ch action t data)))
    (recur)))
the channel [:persist · {…}][:extract · {…}][:error · {…}] go-loopevolve-flow (<! ch)dispatch on :action take nextrun handlerpush (next · payload)
One loop is the trampoline: the go-loop takes an [action · payload] pair off the channel, runs the matching handler, and the handler pushes the next pair back on. Every transition passes through one place.

The same loop, beyond Clojure

# Any language with a queue/channel can run this loop.

loop forever:
    action, payload = take(channel)          # blocks until a message arrives
    try:
        handlers[action](payload, channel)   # may push the next (action, payload)
    except err:
        push(channel, (ERROR, { action, payload, err }))

# A handler names its successor and ships the data that successor needs:
handler EXTRACT_ORDER (payload, channel):
    order = ai.extract(payload.email)
    push(channel, (PERSIST_ORDER, { order }))   # next state + payload

What this buys you

Because every transition is a value on one loop, the things that were hard in the original tangled go-loop become almost free:

  • One vantage point. Every transition passes through one place, so one log line describes the whole walk, and nothing races because the loop serializes. This is the most powerful concurrency pattern I know, and it works just about everywhere.
  • Failure is just another action. The loop's try/catch turns any synchronous throw into an error action carrying the step that failed and the payload that broke it. The call sites stay clean and side-effect free, and error handling lives in one place instead of being scattered through every transition.
  • Compensation, almost by accident. Errors are ordinary actions, so you can route them. A derive hierarchy lets a class of steps share a rollback — anything already written to the database derives a ::rollback-persistence, so a failure three steps later can undo the earlier write. That is the saga idea, with no extra machinery.

One thing the loop can't catch

The try/catch only wraps synchronous throws. If a step spawns its own threads, an exception on one of them sails straight past the loop, so that step has to catch and re-dispatch its own errors. The trampoline sequences the transitions; it doesn't supervise the threads underneath them.

None of it is mine

What I like about this is that none of the three pieces is mine. The loop/recursion equivalence is decades old, state machines are older, and the Redux loop I borrowed wholesale. The only contribution was noticing that they were the same shape, and that lining them up turns a fragile concurrent loop into a state evolution you can actually reason about. Name the next state instead of calling it, and let a channel do the recursion.