On a Cron→Calc→Counter chain (the standard node-connectivity test), the Counter
DataPoint counted, then intermittently stuck: an SSE arrived (node flashed) but
the value didn’t change for several flashes, then “self-healed” and jumped
forward. Separately, the DataPoint settings showed no sources/inputs and
“invoked by source” unchecked — wiring that had been configured. Reproducible
with manually-configured nodes on multiple servers; survived server and app
restart. The server REST state (GET /node/{id}) stayed fully correct
throughout — only the client’s view was wrong.
#428 (d22ebaefd) wrapped ClientNodeManager.update()’s body in scope.launch {}
to bundle the map mutation atomically. That also made every update() an
independent, unordered coroutine, so the flow.value = node writes stopped
being ordered across calls. EventClient processes SSE events sequentially and
does a read-modify-write per event (read current node → change one field →
update), which is only correct if each update() applies before the next
event is read. With the deferred launch, a burst of events (each tick:
STATE_CHANGE→SNAPSHOT_UPDATE→STATE_CHANGE, plus the 1500 ms activity pulse) all
branched from the same stale base; their launched writes applied in scheduler
order, so a later write carrying older data (or pre-wiring meta) landed last and
clobbered a newer one.
In ClientNodeManager.update(), apply the value write synchronously when the
node already exists (nodes[id]?.let { it.value = node; … ; return }) — the
common path and where all the clobbering happened. Only first-sighting of an id
(structural insert) still goes through scope.launch { mutateNodes { … } }, so
#428’s ConcurrentModificationException fix for the map is preserved. A
single-key get + MutableStateFlow.value set are safe outside nodesMutex
(the class doc already sanctions single-key gets).
scope.launch each call.Dispatchers.Unconfined.
Unconfined runs launch eagerly in submission order and masks exactly this
class of reorder bug — it is why the pre-existing ClientNodeManagerPulseSnapshotTest
kept passing while production broke. The new regression test
(ClientNodeManagerUpdateOrderingTest) uses StandardTestDispatcher so the
asynchrony is visible and the lost-update is deterministic.