Krill Peer Mesh Network Architecture
Deep dive into Krill's peer-to-peer mesh networking including beacon discovery, cluster PIN trust, SSE real-time updates, and server settings bootstrap
Krill Peer Mesh Network Architecture
Overview
This document provides a comprehensive analysis of how Krill’s peer-to-peer mesh networking operates, including beacon discovery with rolling TOTP cluster tokens, PIN-based cluster trust, SSE (Server-Sent Events) real-time update lifecycle, and the /trust endpoint for certificate download.
Core Components
Key Classes and Their Responsibilities
| Class | Location | Purpose |
|---|---|---|
BeaconSupervisor | shared | Manages beacon listening and broadcasting lifecycle |
BeaconProcessor | shared | Processes incoming beacons, validates cluster token, detects new/reconnected peers |
BeaconSender | shared | Sends beacon signals with rate limiting; includes rolling TOTP cluster token |
PeerSessionManager | shared | Tracks known peers by installId/sessionId for deduplication |
ServerHandshakeProcess | shared | Handles trust establishment (cert download, health check) and node synchronization |
CertificateCache | shared | Caches validated connections to avoid redundant cert downloads |
SSEBoss | shared | Manages SSE connections to servers for real-time updates |
ServerNodeManager | shared | Server-side node management with nodeUpdates SharedFlow for SSE broadcasting |
ServerServerProcessor | shared | Processes Server node state changes, triggers connection flow |
NodeWire - Beacon Data Structure
1
2
3
4
5
6
7
8
data class NodeWire(
val timestamp: Long, // When beacon was sent
val installId: String, // Stable peer identity (UUID)
val host: String, // Hostname or IP
val port: Int, // Server port (0 for apps)
val sessionId: String, // Current session ID (changes on restart)
val clusterToken: String // Rolling TOTP token derived from cluster PIN
)
Key Identity Rules:
installIdis the primary peer identity (stable across restarts, IP changes)sessionIdchanges on each restart (used to detect peer restarts)clusterTokenis a time-based rolling token derived from the cluster PIN; used to validate cluster membership before any trust is establishedhost:portis used for network connectivity but NOT for identity keying
Beacon Discovery Flow
Beacon Classification
- Server beacons:
port > 0(servers listen on a specific port) - App beacons:
port == 0(apps don’t run a server)
Startup Beacon Flow
sequenceDiagram
participant App as App/Server
participant BS as BeaconSupervisor
participant MC as Multicast
participant BP as BeaconProcessor
participant PSM as PeerSessionManager
participant SHP as ServerHandshakeProcess
App->>BS: startBeaconProcess()
BS->>MC: receiveBeacons(callback)
BS->>BS: sentStartupBeacon = true
BS->>MC: sendBeacon(wire + clusterToken)
Note over MC: Beacon broadcast on startup
MC-->>BS: Incoming beacon received
BS->>BS: handleIncomeWire(wire)
alt wire.port > 0 (Server beacon)
BS->>BP: processWire(wire)
BP->>BP: validateClusterToken(wire.clusterToken)?
alt Invalid cluster token
Note over BP: Skip - not a cluster member
else Valid cluster token
BP->>PSM: isKnownSession(wire)?
alt Duplicate beacon (same session)
PSM-->>BP: true
Note over BP: Skip - already known
else Known host, new session (restart)
PSM-->>BP: false (but hasKnownHost = true)
BP->>PSM: add(wire)
BP->>SHP: trustServer(wire)
else New host
PSM-->>BP: false
BP->>PSM: add(wire)
BP->>SHP: trustServer(wire)
end
end
else wire.port == 0 (App beacon)
alt SystemInfo.isServer()
BS->>MC: sendBeacon(ourWire)
Note over BS: Server responds to app beacon
end
end
Connection Pipeline Flow
Connection Result States
1
2
3
4
5
6
enum class ConnectionResult {
SUCCESS, // Connected and synced
CERTIFICATE_ERROR, // SSL/TLS issue - need cert download
NETWORK_ERROR, // Network unreachable
AUTH_ERROR // Bearer token rejected (PIN mismatch)
}
Beacon Validated to SSE Connection
sequenceDiagram
participant BP as BeaconProcessor
participant SHP as ServerHandshakeProcess
participant CC as CertificateCache
participant HTTP as HTTP Client
participant SSE as SSEBoss
BP->>SHP: trustServer(wire)
Note over BP,SHP: Beacon cluster token already validated
Note over SHP: Mutex lock with installId-based job key
SHP->>SHP: Check existing job for installId
alt Job already running
Note over SHP: Skip - idempotent
else New job
SHP->>CC: hasValidConnection(installId, host, port, sessionId)?
alt No cached connection
SHP->>HTTP: GET /trust (download certificate)
HTTP-->>SHP: Self-signed certificate
SHP->>SHP: Store certificate in trust store
SHP->>HTTP: GET /health (Bearer: PIN-derived token)
alt Success (200 OK)
SHP->>HTTP: GET /nodes (Bearer: PIN-derived token)
HTTP-->>SHP: Node data
SHP->>SHP: Sync all nodes
SHP->>CC: markValid(installId, ...)
SHP->>SSE: connect(peer)
Note over SSE: SSE stream with Bearer: PIN-derived token
SHP->>SHP: return SUCCESS
else SSL Error
SHP->>CC: invalidate(installId)
SHP->>SHP: Re-download cert, retry
else Unauthorized (401)
SHP->>SHP: return AUTH_ERROR
Note over SHP: PIN mismatch - not same cluster
end
else Cached connection valid
SHP->>SSE: connect(peer)
SHP->>SHP: return SUCCESS
end
end
/trust Endpoint - Certificate Download
The /trust endpoint serves the server’s self-signed TLS certificate. Certificate download happens only after the beacon’s rolling TOTP cluster token has been validated, ensuring that only cluster members download each other’s certificates.
/trust Flow
sequenceDiagram
participant ServerA as Server A
participant ServerB as Server B
Note over ServerA: Received beacon from B<br/>with valid cluster token
ServerA->>ServerB: GET /trust
ServerB-->>ServerA: Self-signed certificate (krill.crt)
ServerA->>ServerA: Store cert in trust store
ServerA->>ServerB: GET /health (Bearer: PIN-derived token)
ServerB-->>ServerA: 200 OK
ServerA->>ServerB: SSE /sse (Bearer: PIN-derived token)
Note over ServerA,ServerB: Real-time sync established
SSE Real-Time Updates
Architecture Overview
Real-time node updates flow from server to clients using Server-Sent Events (SSE). This replaces the previous WebSocket-based approach with a simpler, more reliable unidirectional stream.
sequenceDiagram
participant Client as App Client
participant SSEBoss as SSEBoss
participant Server as Ktor Server
participant NM as ServerNodeManager
participant Actor as Actor Channel
Client->>SSEBoss: connect(serverNode)
SSEBoss->>Server: GET /sse (HTTPS, Bearer: PIN-derived token)
Server-->>SSEBoss: Initial server state
Note over NM,Actor: Node state change occurs
NM->>Actor: update(node)
Actor->>Actor: updateInternal(node)
Actor->>NM: emit to _nodeUpdates SharedFlow
NM-->>Server: nodeUpdates.collect { node }
Server-->>SSEBoss: SSE event: node JSON
SSEBoss->>Client: nodeManager.update(node)
SSE Connection Flow
stateDiagram-v2
[*] --> Disconnected
Disconnected --> Connecting: SSEBoss.connect(node)
Connecting --> Connected: SSE stream established
Connecting --> Disconnected: Connection error
Connected --> Receiving: Receive node update
Receiving --> Connected: Update local NodeManager
Connected --> Disconnected: Connection closed
Connected --> Disconnected: Error
Disconnected --> [*]: Server removed
note right of Disconnected: Reconnection attempted on next beacon
note right of Connected: Real-time updates flow via SSE
Key Components
Server-Side (Routes.kt):
1
2
3
4
5
6
7
8
9
10
11
sse("/sse", serialize = { _, node ->
fastJson.encodeToString<Node>(node as Node)
}) {
// Send current server state on connect
send(nodeManager.readNodeState(installId()).value)
// Collect from the nodeUpdates SharedFlow for real-time updates
nodeManager.nodeUpdates.collect { node ->
send(node)
}
}
Client-Side (SSEBoss.kt):
1
2
3
4
5
6
7
client.sse(urlString = sseUrl.toString()) {
incoming.collect { event ->
deserialize<Node>(event.data)?.let { node ->
nodeManager.update(node)
}
}
}
NodeManager SharedFlow Integration
The ServerNodeManager uses the actor pattern for thread-safe updates and emits to a SharedFlow for SSE broadcasting:
1
2
3
4
5
6
7
8
9
10
11
// In ServerNodeManager.updateInternal()
private fun updateInternal(node: Node) {
// Update the StateFlow (triggers NodeObserver → Processor)
val f = nodes.getOrPut(node.id) { MutableStateFlow(node).also { observe(it) } }
f.value = node
// Broadcast to SSE clients (only for owned nodes)
if (node.isMine() && node.state != NodeState.DELETING) {
scope.launch { _nodeUpdates.emit(node) }
}
}
Disconnect Handling
When an SSE connection is lost:
SSEBosscatches the exception and logs it- The job is removed from the active connections map
- On next beacon from peer, a new SSE connection is attempted
Peer Server Node State Transitions
stateDiagram-v2
[*] --> NONE: Beacon validated + connection success
NONE --> USER_EDIT: User updates settings
USER_EDIT --> NONE: Connection success
USER_EDIT --> ERROR: Connection failure
NONE --> ERROR: SSE disconnect
ERROR --> NONE: Beacon + successful resync
[*] --> ERROR: Beacon validated + connection failure (network/cert error)
note right of ERROR: ERROR state does NOT trigger network work
note right of USER_EDIT: Triggers connection pipeline
Invariants and Guarantees
Startup Invariants
- Client Node EXECUTE once: On app start, client node is created/loaded and executed exactly once
- Server Node EXECUTE once: On server start, server node is created/loaded and executed exactly once
- Beacon listening starts: Both apps and servers start beacon listening on startup
Beacon Handling Invariants
- Server beacons identified by port > 0
- Cluster token validated first: Beacons with invalid TOTP cluster tokens are discarded before any further processing
- Duplicate beacons ignored: Same installId + sessionId = already known
- Peer restarts detected: Same installId + different sessionId = reconnect flow
- Server responds to app beacons: Enables discovery
Connection Pipeline Invariants
- Idempotent operations: Job keys use
installId-sessionIdto prevent duplicates - Beacon validates cluster membership: Invalid cluster token = discard beacon, no connection attempted
- ERROR state = STOP: ERROR nodes don’t trigger new network work
- Servers don’t broadcast deleting nodes: DELETING states are not sent over SSE
Identity Invariants
- installId is primary identity: Used for all peer tracking
- sessionId detects restarts: New session = peer restarted
- host:port for connectivity only: NOT used for identity keying
Troubleshooting Guide
Symptom: Server appears in ERROR state
Likely Causes:
- PIN mismatch (server not in same cluster)
- Network unreachable
- Certificate trust issue
Log Lines to Look For:
1
2
3
4
Received beacon from ${wire.host()}:${wire.port}
Invalid cluster token from beacon - ignoring peer
Bearer token rejected (401) - cluster PIN may not match
SSL/Certificate error for peer ${installId}
Symptom: Duplicate connections
Likely Causes:
- Using host:port as key instead of installId (fixed in this update)
- Beacon storm without proper deduplication
Log Lines to Look For:
1
2
Connection job already in progress for ${jobKey}
Connection already active for peer ${installId}
Symptom: Node updates not received
Likely Causes:
- SSE connection disconnected
- ERROR state on peer node
- Server not emitting to nodeUpdates SharedFlow
Log Lines to Look For:
1
2
3
SSEBoss connecting failed
SSE sending update: ${type} ${state}
node updated ${details}
Diagram Source Map
These diagrams were derived from the actual code in the repository:
| Diagram | Source Classes/Functions |
|---|---|
| Beacon Discovery Flow | BeaconSupervisor.startBeaconListener(), BeaconSupervisor.handleIncomeWire(), BeaconProcessor.processWire(), PeerSessionManager.isKnownSession() |
| Connection Pipeline Flow | ServerHandshakeProcess.trustServer(), ServerHandshakeProcess.attemptConnection(), ServerHandshakeProcess.downloadAndSyncServerData(), CertificateCache.hasValidConnection() |
| /trust Flow | Routes.kt GET /trust, ServerHandshakeProcess.trustServer() |
| SSE Connection Flow | SSEBoss.connect(), Routes.kt /sse endpoint, ServerNodeManager.updateInternal(), NodeManager.nodeUpdates |
| Peer Node State Transitions | ServerHandshakeProcess connection results, NodeManager.setErrorState(), NodeManager.complete() |
Related Documents
Last verified: 2026-04-03