Deploy Local Runner.yml was a hybrid of two unrelated jobs glued together:
a deb-build pipeline (server / desktop / krill-pi4j / krill-mcp) and a
post-build “tester” handoff to krill-ghost-bot. It picked up on the
generic ['self-hosted', 'Linux', 'ARM64'] label set, which any ARM64
self-hosted runner across the org could match — so QA-verify runs were
landing on whichever ARM64 box happened to be free instead of the dedicated
ghost runner where the QA agent’s tooling (skill symlink, bearer cache, MCP
host config) actually lives. It also re-cloned the three repos via
actions/checkout even though the runner already has long-lived working
copies under /home/ben/Code/, and it didn’t stamp the deb Version: per
run, so a second verify on the same PR was a dpkg no-op against an
already-installed identical-version package.
Two drift points compounded:
Dev Agent Blue.yml (the analogous
dev-side workflow) targets ['self-hosted', 'Linux', 'X64', 'blue'] —
the trailing blue is what pins the job to the dev-agent workstation.
The verify workflow had been written before that pattern was
established and never picked up the equivalent ghost label.*/package/DEBIAN/control carry the release version (e.g. 1.0.894).
Release Version.yml bumps them in lockstep with version.txt on a
release. Inside a PR-verify run the working tree never sees that bump,
so the rebuilt deb has the same Version: as the previously-installed
one and apt-get install short-circuits.Deploy Local Runner.yml → Verify Agent Ghost.yml. The new
name matches the Dev/Verify Agent runs-on is now ['self-hosted', 'Linux', 'ghost'] so only the ghost
workstation accepts the job.actions/checkout steps. The workflow now syncs the three
long-lived clones at /home/ben/Code/krill, /home/ben/Code/krill-oss,
/home/ben/Code/krill-agents via update_repo (sourced from
krill-agents/scripts/lib.sh, same helper Dev Agent Blue uses). The
krill clone gets the PR head fetched + checked out detached so we never
leak a verify-only branch into the local clone.Version: line with +qaPR<n>.r<m> (PR number + GitHub run
number). The format sorts strictly above both the base release version
and any prior verify run, so dpkg always treats the rebuilt package as
an upgrade and reinstalls cleanly. The four control files updated:
server/, composeApp/, pi4j-ktx-service/krill-pi4j-service/,
krill-mcp/krill-mcp-service/.git restores the version-stamp edits and bounces back to main, so
the next run (or the next agent loop on the same machine) starts from a
clean tree. update_repo skips dirty trees, so leaving stamped control
files in place would silently freeze the local copies on stale main.verify-agent-ghost (no PR scoping) with
cancel-in-progress: false — only one verify at a time on the runner,
queued rather than cancelled, because each run wipes
/srv/krill/data/*.db and restarts krill.service.runs-on. Conversely, when a workflow is scoped to one
workstation, the label set must be specific enough that no other
self-hosted runner could match — the bug here was matching on
['self-hosted', 'Linux', 'ARM64'] alone, which any ARM64 box would
satisfy. Mirror the Dev Agent <Color> pattern: include the color/role
label and don’t rely on coincidence of OS/arch.actions/checkout is the wrong shape on a long-lived workstation.
If the runner already has the working copy and other workflows use
update_repo-style fetch+ff-pull, do the same — actions/checkout on
a self-hosted runner clones into _work/<repo>/<repo>/, which is
separate from /home/ben/Code/<repo> and won’t share the user’s
Gradle/Kotlin caches, the krill-agents skill symlink, or the bearer
token at ~/.krill/pin_token. The QA tester step needs those..deb repeatedly must mutate the Version:
field per run. +qaPR<n>.r<m> is a debian-version-comparator-friendly
suffix that doesn’t pretend to be a release. If you ever need to query
“is this install from a verify run?”, grep the output of
dpkg-query -W -f='${Version}\n' krill for +qaPR.if: always() restore step is load-bearing: without it
the next run’s update_repo "$KRILL_REPO_DIR" would skip the pull
because the working tree still carries the prior run’s
DEBIAN/control edits, and the agent loop on the same box would keep
reading stale version.txt-derived state.