Symptom

Kraken’s nightly architectural scan flagged that ClientNodeManager.alarm() used the caller’s captured (potentially stale) Node snapshot for its write, while reset() and startPairing() already correctly read the current stored value first. In production, EventClient and ClientServerConnector both call alarm(node) with a node captured at connection-start time. During exponential backoff (up to 60 s per attempt), SSE events can update the stored node’s metadata; the alarm call would then write back the old metadata, silently reverting wiring changes or snapshot updates that arrived during the retry window.

Root cause

alarm() originally read readNodeStateOrNull(node.id).value?.let { original -> if (original.state == NodeState.DELETING) return } and then wrote update(node.copy(state = NodeState.WARN, ...)) using the passed-in node. Two problems:

  1. The write used the stale caller snapshot instead of the current stored node, clobbering any metadata updates that arrived between capture and alarm.
  2. The DELETING guard was ineffective: readNodeStateOrNull filters out DELETING nodes (via nodeAvailable), so the guard never fired — a DELETING node could be resurrected as WARN.

Both reset() and startPairing() had already been fixed to read the current stored node before writing; alarm() was missed.

Fix

Prevention