A krill-mcp client connected to a single seed (pi-krill-05.local) could not
discover the swarm’s other peers (pi-krill.local). GET /health returned the
seed’s own Server node serialized via ServerMetaData, with no peer field;
the seed already had every peer in memory but did not surface any of them on
that route. krill-mcp had no path to bootstrap transitive discovery from one
seed without parsing the heavier /nodes payload.
/health was framed as “this server’s own meta + non-Server child node ids”
since 1d02ef4d5 simplified peer connections (2026-02-08), and explicitly
filters KrillApp.Server-typed rows out of the nodes array
(Routes.kt:851). The exclusion is correct for /health’s shape contract
(stable single-server payload) but leaves no route at all for the swarm view.
The deeper invariant the route had to honour, which was undocumented in code
and lived only in the Mesh Network agent prompt: every server stores other
servers it has discovered as full KrillApp.Server rows alongside its own,
and those rows must NOT be serialized as-is. The receiving app or peer server
already knows that install as its own authoritative Server node, and a
same-id Server payload would clobber it on ingest. The shared
toPeer(node) helper (shared/.../node/SharedNodeFunctions.kt) is the
canonical conversion: it flips type to KrillApp.Server.Peer, rewrites
id to the composite {thisServerId}:{peerServerId}, and reparents under
the responding server. Without that conversion, exposing the peer list would
silently corrupt downstream state.
Added a sibling GET /peers route under authenticate("auth-api-key") in
server/.../Routes.kt, leaving /health shape-stable per the issue’s
recommended option (2). The route delegates to a new
server/.../io/PeerProjection.kt helper:
1
2
3
4
fun peerProjection(nodes: List<Node>): List<Node> =
nodes
.filter { it.type == KrillApp.Server && !it.isMine() }
.map(::toPeer)
Using == rather than is keeps the projection scoped to the top-level
Server data object — child types under the Server sealed parent
(Pin, LLM, SerialDevice, Backup, Peer) are out of scope for swarm
enumeration. The !isMine() clause drops the local server. toPeer is the
shared conversion documented above. Helper-level KDoc enumerates all three
field rewrites (type, id, parent) so future readers don’t have to
chase them across files.
Regression coverage in
server/src/jvmTest/.../io/PeerProjectionTest.kt exercises three cases:
mixed local/remote/child types (drop-rules), composite-id formation, and
the empty-swarm degenerate. Tests guard against accidental relaxation of
the type filter or id format — both of which would silently break consumer
ingest rather than surface as a route error.
peerProjection
helper’s KDoc and into this lesson, so future readers don’t have to find
the prompt to understand why direct serialization of KrillApp.Server
rows is forbidden. If another swarm-shaped endpoint gets added later, it
should call peerProjection (or toPeer directly) — not roll its own
filter — for the same reason.== vs is on a sealed data object matters. KrillApp.Server is a
data object whose nested types (Pin, Peer, etc.) are subclasses; an
is-check pulls all of them in. The existing /nodes route happens to
do is and tolerates the broader set; new code modelling “swarm view”
must use == to avoid converting Pin/SerialDevice nodes into bogus
Peer entries.peerProjection called toPeer(node) and Node.isMine(), both of
which read installId() and write ~/.krill/install_id on first
call. Tests had to compensate with a @BeforeEach mkdirs() — the
exact “tests that need hand-holding” pattern Ben called out on PR
#190. Fixed by parameterising both: peerProjection(nodes, selfId)
and a new pure toPeer(node, selfId) overload in shared/. The
route resolves installId() once at the boundary; the helpers stay
side-effect-free and the test passes under HOME=/nonexistent.
Default rule: any helper that gates on installId(), hostName, or
similar expect-val platform calls should accept those values as
arguments and let the route handler resolve them. Tests that
re-create the resolution inside the unit under test are signalling
a missing seam, not a missing setup step.KrillApp.Server-typed nodes leaving the
server unconverted would catch missed call sites, but the search space
is wide and the false-positive rate would be high. Relying on code
review + the peerProjection helper as the canonical entry point.