Post

Icon HostProcessor Thread Safety & Connection Logic Review

HostProcessor Thread Safety & Connection Logic Review

HostProcessor Thread Safety & Connection Logic Review

HostProcessor Thread Safety & Connection Logic Review

Summary of Changes

1. HostProcessor.kt - Major improvements for thread safety and connection handling

Thread Safety Fixes:

  • ✅ Changed all LazyThreadSafetyMode.NONE to SYNCHRONIZED for thread-safe dependency injection across all processors (HostProcessor, NodeEmitProcessor, NodeEmitAppProcessor)
  • ✅ Added wireProcessingMutex to prevent race conditions when processing incoming wire messages
  • ✅ BeaconService already handles beacon listener deduplication via its jobs map (no need for instance-level flags in HostProcessor)
  • ✅ Wrapped all critical sections with proper mutex locking

Connection Logic Improvements:

  • ✅ Separated wire processing into clear functions: processWire(), processHostWire(), handleNewPeer(), handleKnownPeer()
  • ✅ Added proper error handling with try-catch blocks around wire processing
  • ✅ Fixed logic for handling known peers with new sessions (server/client restarts)
  • ✅ Added explicit logging for all connection state changes
  • ✅ Added shutdown() method for cleanup when host processor is stopped

Missing Conditions Fixed:

Scenario 1: Server Restart

  • ✅ When a known server restarts (new sessionId), the client now:
    1. Updates the session in peerSessionManager
    2. Updates the node with latest wire data
    3. Re-establishes trust via serverHandshakeProcess.trustServer()
    4. If I’m also a server, advertises back

Scenario 2: Client Restart

  • ✅ When a known client restarts (new sessionId), the server now:
    1. Updates the session in peerSessionManager
    2. Updates the node with latest wire data
    3. Sends a signal to advertise itself to the reconnected client

Scenario 3: Duplicate Beacons

  • ✅ When receiving a beacon from a known peer with the same session:
    1. Logs as duplicate/heartbeat (debug level)
    2. Updates node metadata in case anything changed
    3. Does NOT re-establish connections (prevents loops)

Scenario 4: App Launch

  • ✅ When a new client app launches:
    1. Creates node stub from wire data
    2. Sends signal back to advertise the host
    3. Does NOT start unnecessary handshake processes (clients don’t need full trust)

Scenario 5: Multiple emit() Calls

  • ✅ Protected by emitMutex and beaconStarted flag
  • ✅ Only starts beacon service once per host node
  • ✅ Subsequent calls are safely skipped with debug logging

Scenario 6: Error Recovery

  • ✅ If beacon service fails to start, resets beaconStarted flag to allow retry
  • ✅ Wire processing errors are caught and logged without crashing the service

2. PeerSessionManager.kt - Added thread safety

Changes:

  • ✅ Added Mutex for thread-safe access to knownSessions map
  • ✅ Made all methods suspend to support mutex locking
  • ✅ Added getSession() method for future use (retrieving session by node ID)

3. BeaconService.kt - Improved wire handling

Changes:

  • ✅ Changed discovered callback to suspend function
  • ✅ Removed session checking logic from BeaconService (moved to HostProcessor)
  • ✅ Always calls discovered() callback - let HostProcessor decide what to do
  • ✅ Added wireProcessingMutex for thread-safe wire processing
  • ✅ Added stop(nodeId) and stopAll() methods for cleanup
  • ✅ Added comprehensive error handling with try-catch blocks
  • ✅ Added logging for service lifecycle events

Potential Edge Cases Covered

1. Race Condition: Multiple Beacons Arriving Simultaneously

  • ✅ Protected by wireProcessingMutex in both BeaconService and HostProcessor
  • Each wire is processed sequentially, preventing session state corruption

2. Race Condition: Session Manager Access

  • ✅ All PeerSessionManager methods use mutex locking
  • Safe concurrent access from multiple coroutines

3. Error Loop: Server Keeps Restarting

  • ✅ Each restart creates a new session ID
  • ✅ HostProcessor handles new session properly without creating duplicate listeners
  • ✅ Old session is replaced, not duplicated

4. Error Loop: Duplicate Beacon Listeners

  • ✅ BeaconService tracks active beacon jobs in its jobs map (keyed by node.id)
  • ✅ BeaconService.start() checks if job already exists before creating a new one
  • ✅ Thread-safe with beaconJobMutex in BeaconService

5. Memory Leak: Beacon Services Not Cleaned Up

  • ✅ Added shutdown() method to HostProcessor
  • ✅ Added stop() and stopAll() methods to BeaconService
  • Applications should call these on cleanup

6. Dependency Injection Race Condition

  • ✅ All by inject() now use LazyThreadSafetyMode.SYNCHRONIZED
  • Safe initialization even when accessed from multiple threads

Testing Recommendations

Unit Tests Needed:

  1. Test multiple concurrent emit() calls only start one beacon service
  2. Test server restart scenario (known peer, new session)
  3. Test client restart scenario (known peer, new session)
  4. Test new peer discovery
  5. Test duplicate beacon handling (same session)
  6. Test error recovery when beacon service fails to start
  7. Test concurrent wire processing from multiple peers

Integration Tests Needed:

  1. Network partition and reconnection
  2. Server crash and restart with different data
  3. Multiple clients connecting simultaneously
  4. Client app closed and relaunched
  5. Server under load with many beacon messages

Manual Test Scenarios:

  1. Scenario A: Start server, start client, verify connection
  2. Scenario B: Start server, start client, restart server, verify reconnection
  3. Scenario C: Start server, start client, restart client, verify reconnection
  4. Scenario D: Start server, start multiple clients simultaneously, verify all connect
  5. Scenario E: Network blip causing duplicate beacons, verify no error loops
  6. Scenario F: Kill server process, restart with same data, verify clients reconnect
  7. Scenario G: Kill server process, restart with different data, verify clients re-download

Remaining Considerations

1. Session Cleanup

  • When a peer disconnects (timeout), should we remove its session?
  • Currently sessions accumulate forever in PeerSessionManager
  • Recommendation: Add a cleanup mechanism or TTL for sessions

2. Beacon Service Lifecycle

  • shutdown() method exists but needs to be called from application lifecycle hooks
  • Recommendation: Integrate with application/server shutdown hooks

3. WebSocket Connection Management

  • HostProcessor handles discovery, but WebSocket lifecycle is separate
  • Recommendation: Ensure WebSocket cleanup is coordinated with session management

4. Node State Synchronization

  • When server restarts, clients re-download nodes
  • Recommendation: Verify that concurrent node downloads don’t corrupt local state

5. Multi-Server Scenarios

  • If there are multiple servers, verify they don’t create beacon loops
  • Recommendation: Test server-to-server beacon handling

Code Quality Improvements

What Changed:

  • Better separation of concerns (processWire, handleNewPeer, handleKnownPeer)
  • More descriptive logging at appropriate levels (i, d, w, e)
  • Comprehensive error handling
  • Clear comments explaining each condition
  • Consistent mutex usage patterns

LazyThreadSafetyMode Justification:

  • SYNCHRONIZED: Required because:
    1. Multiple threads may call emit() simultaneously
    2. Coroutines run on different threads in multiplatform context
    3. Koin injection without synchronization can cause race conditions
    4. Prevents double initialization of expensive resources (BeaconService, etc.)

Checklist for Deployment

  • Update all call sites that use PeerSessionManager to handle suspend functions
  • Add application shutdown hook to call HostProcessor.shutdown()
  • Add integration tests for reconnection scenarios
  • Monitor logs for any “Beacon service already started” messages (indicates protection working)
  • Monitor for any session state corruption (should not occur with mutex)
  • Performance test: verify mutex locking doesn’t create bottlenecks
  • Add session cleanup mechanism (TTL or explicit disconnect handling)

Metrics to Monitor

  1. Beacon listener count - Should always be 1 per host node
  2. Session map size - Should grow with peers, but not unbounded
  3. Wire processing time - Should be fast (< 100ms typically)
  4. Connection re-establishment time - When peer restarts
  5. Error rate - In wire processing and beacon callbacks
This post is licensed under CC BY 4.0 by the author.