Symptom

Kraken nightly scan hypothesised that propagationContextTL (a ThreadLocal) could carry a stale or absent PropagationContext inside processor coroutines, weakening cycle-dedupe (D7), cross-host hop counting (D11), and RESET suppression (D6). The specific confirmed bug: ServerLLMProcessor.clearSnapshot() called nodeManager.update() from a RESET invocation and incorrectly published to observers that should have been suppressed.

Root cause

PropagationContextElement (a ThreadContextElement) correctly installs / restores propagationContextTL on the thread for any coroutine whose context contains the element. The element IS propagated to direct structured children launched with unqualified launch {} inside a coroutine body.

However, every server processor called scope.launchProcessing(nodeManager, node) {} or bare scope.launch {} using the injected scope as the receiver. The injected scope is a long-lived application scope whose coroutine context never contains PropagationContextElement. Calls like scope.launchProcessing(...) therefore create sibling coroutines of the invoke() coroutine — the PropagationContextElement installed by invoke() is never inherited.

Inside those siblings propagationContextTL.get() returns null, so:

Fix

ProcessingScope.launchProcessing now captures propagationContextTL.get() synchronously at call time (while the caller’s PropagationContextElement is still pinned on the thread), then passes a new PropagationContextElement wrapping the captured context into the launched coroutine:

1
2
3
4
5
fun CoroutineScope.launchProcessing(...): Job {
    val capturedCtx = propagationContextTL.get()
    val extra = capturedCtx?.let { PropagationContextElement(it) } ?: EmptyCoroutineContext
    return launch(extra) { try { block() } catch ... }
}

Five processors that used bare scope.launch {} for propagation-chain work were migrated to scope.launchProcessing(nodeManager, node) {}: ServerComputeProcessor, ServerCalculationProcessor, ServerLLMProcessor, ServerSMTPProcessor, ServerTaskListProcessor. Three processors already using launchProcessing (ServerDataPointProcessor, WebHookOutboundProcessor, ServerTriggerProcessor) are auto-fixed by the launchProcessing change.

Two regression tests added to PropagationContextPropagationTest: one verifying epoch inheritance, one verifying suppressPublish propagation through launchProcessing.

Prevention