Symptom

Nightly architectural scan flagged ClientNodeManager.nodes as a non-thread-safe LinkedHashMap accessed from multiple coroutines on Dispatchers.Default. The existing try/catch around nodes() (returning emptyList() on exception) was a silent indicator the race had been observed before. collectNodeIdsToRemove could throw ConcurrentModificationException and leave zombie nodes in the swarm.

Root cause

nodes: MutableMap<String, MutableStateFlow<Node>> = mutableMapOf() is a plain LinkedHashMap. The shared CoroutineScope uses Dispatchers.Default, which schedules coroutines on a thread pool. Concurrent calls to update() (SSE handler) and remove() (DELETING-state processor) could interleave map structural mutations with iterations in collectNodeIdsToRemove, nodes(), and children().

Fix

Prevention