2026-06-26-client-store-concurrent-writer-stale-overwrite

Symptom

On a Cron→Calc→Counter swarm the Counter DataPoint’s displayed value intermittently stuck (the node flashed/executed for 2–3 ticks but the value held at its old number), then self-healed and jumped forward. The server REST state (GET /node/{id}) was correct throughout — only the client view was wrong. This is the same class as 2026-06-13-client-node-store-async-update-stale-overwrite, which #455 only half-fixed.

Root cause

ClientNodeManager replaced the whole node on every write. Two independent writers — the SSE loop delivering SNAPSHOT_UPDATE (value) and the observer-driven pulse delivering STATE_CHANGE (state) — each captured the node, changed one field, and wrote it all back via update(node). A state write that branched from a base captured before the value landed reverted the snapshot to its stale value; when the race resolved the other way the value “jumped”. #455 removed the reordering from deferred launches but explicitly deferred the robustness fix for genuinely concurrent writers (“a monotonic guard or single-actor queue… Tracked for later”).

Fix

Field-scoped atomic mutation. New private mutateStored(id) { transform } runs transform against the LATEST stored node via MutableStateFlow.update {} (CAS loop), so a writer only ever changes its own concern. New applyState(id, state) transitions only the lifecycle state; EventClient’s STATE_CHANGE arm now calls it instead of a whole-node update(currentNode.copy(state)) from a captured (stale) base. updateSnapshot, reset, startPairing, and alarm were rewritten to mutate only their own field through mutateStored (with an insert-on-absent fallback preserving prior behaviour). State-only and value-only writers now commute. Regression: ClientNodeManagerConcurrentWriteTest (red→green), including a test that characterises the old whole-node clobber and proves applyState doesn’t reproduce it.

Prevention

When two independent writers each own a different field of the same shared StateFlow<T>, never read-modify-write the whole T from a captured base — mutate the owned field atomically against the latest value (StateFlow.update {}) so the writers commute. A captured-node + copy + whole-node write is a lost-update waiting to happen the moment a second concurrent writer touches a different field. update(node) (whole-node replace) is now reserved for authoritative full-node SSE payloads (CREATED, full _nodeUpdates); per-field SSE events (STATE_CHANGE, SNAPSHOT_UPDATE, PIN_CHANGED) go through the field-scoped mutators.