Assay-Runner Delegated Runner Runbook¶
Intent¶
Operational guide for the internal Runner Spike Delegated workflow:
- workflow:
.github/workflows/runner-spike-delegated.yml - runner labels:
[self-hosted, linux, assay-bpf-runner] - trigger:
workflow_dispatchonly - proof scope: Phase 1 Linux/eBPF runner-spike gates
This runbook keeps the delegated lane explicit. It is not a general-purpose self-hosted runner recipe and does not make macOS, Windows, live LLM, hosted service, or production-load claims.
What This Lane Proves¶
The delegated workflow runs the Linux/eBPF gates that hosted CI cannot prove:
| Gate input | Script | Proof shape |
|---|---|---|
kernel-only | scripts/ci/runner-spike-kernel-only-three-run-determinism.sh | kernel observation, cgroup correlation, bundle verification, three-run determinism |
kernel-policy | scripts/ci/runner-spike-kernel-policy-three-run-determinism.sh | kernel plus policy correlation, bundle verification, three-run determinism |
openai-agents-kernel-policy | scripts/ci/runner-spike-openai-agents-kernel-policy-three-run-determinism.sh | kernel plus policy plus real @openai/agents SDK runtime correlation, bundle verification, three-run determinism |
all | all of the above, plus scripts/ci/runner-spike-openai-agents-kernel-policy-hidden-write-three-run-determinism.sh | full delegated proof, including the post-closure hidden-write semantic-gap expansion gate |
Each three-run script executes its single-run acceptance three times and then compares the deterministic artifacts. Kernel raw event ordering is not used as a causal-ordering claim; the bundle claim remains set/window based.
Host Requirements¶
The assay-bpf-runner host must be dedicated to this lane. The workflow may wipe Docker caches, GitHub Actions _actions caches, workspace contents, and runner-spike temp directories.
Required baseline:
- Linux host with cgroup v2 mounted at
/sys/fs/cgroup - self-hosted GitHub Actions runner registered with labels:
self-hostedlinuxassay-bpf-runnersudoavailable for the runner user- Rust toolchain with
cargoandrustc - Node.js 22 or newer plus
npm python3,git, andtar- eBPF-capable kernel and enough permissions for BPF program load/attach
- Docker optional; when present, the workflow builds the eBPF artifact through
cargo xtask build-imageandcargo xtask build-ebpf --docker
The workflow preflight checks only the minimum executable and cgroup conditions. Kernel capability, BPF verifier behavior, cgroup layout, and runner cleanliness are proven by the gates themselves.
Security Model¶
This lane is intentionally narrower than ordinary CI:
workflow_dispatchonly; nopull_request,push, orscheduletrigger- job permissions are
contents: read - checkout uses
persist-credentials: false - the runner must be dedicated to this workflow class because cleanup is destructive
- no OpenAI API key or live LLM secret is required; the OpenAI Agents fixture uses a deterministic local model provider
- the gated command runs under
sudo; the workflow explicitly setsPATH,ASSAY_BIN, andASSAY_EBPF_PATHfor that command, but the environment is not scrubbed down to only those variables
Do not add broad secret-bearing environment variables to this workflow. If a future gate needs credentials, treat that as a new security design review, not as a small runbook or workflow edit.
Dispatch Procedure¶
- Open GitHub Actions.
- Select
Runner Spike Delegated. - Click
Run workflow. - Choose:
gates=allfor Phase 1 acceptance or final regression checks- one narrower gate for focused diagnosis
build_ebpf=truefor the documented workflow path- use
build_ebpf=falseonly if the workflow has been explicitly updated to restore or copy a prebuilttarget/assay-ebpf.ointo place after the job cleanup and checkout steps - Wait for the
Phase 1 delegated gates (...)job. - Record:
- workflow run URL
- commit SHA
- selected
gates - pass/fail result
- uploaded proof-pack artifact name, when present
- relevant PASS lines or failure diagnostics
Recommended progression during diagnosis:
kernel-onlykernel-policyopenai-agents-kernel-policyall
Use all once the narrower failing gate has been fixed. This avoids mixing multiple failure signals in the same diagnostic run.
For merge-time gate selection, use the Runner CI lane contract. It classifies which runner surfaces require delegated proof and which minimum gate applies.
For acceptance or regression evidence, use build_ebpf=true. The prepare and workspace cleanup steps remove target/, so a previously built target/assay-ebpf.o will not survive the documented workflow path. build_ebpf=false is only meaningful after adding an explicit, deterministic artifact restore or copy step.
On a fresh host, the Docker-based eBPF image/artifact build can take a few minutes. That is expected; do not cancel the run just because the build step is quiet for the first minute or two.
Delegated Proof Pack¶
Future delegated runs upload a first-class proof-pack artifact named assay-runner-delegated-proof-pack-<run_id>. The upload happens before delegated-runner cleanup and uses if: always() semantics, so failed runs retain whatever archives, selected JSON artifacts, PASS lines, and gate status metadata were produced before failure.
The proof pack is operational evidence, not normalized runner evidence. It remains separate from the runner archive schema unless a future artifact contract explicitly says otherwise. Its manifest distinguishes historical verification from current-state verification:
- historical verification: the proof pack is sufficient to review what happened for the recorded run, commit, inputs, and retained artifacts
- current-state verification: a fresh delegated dispatch is still required for a new commit, host state, dependency set, or workflow version
Artifact retention is configured to 365 days. The manifest records the configured retention, total pack size, archive digests, selected JSON digests, and any gates that were missing, failed, skipped, or incomplete.
Skip Semantics¶
The underlying scripts still use exit 40 for environmental skips in cross-platform CI contexts. In this delegated lane, a skip is a failure.
Reason: this host is the environment where Linux, cgroup v2, Node 22, and the eBPF artifact are expected to exist. If a delegated script skips, the runner or workflow has drifted from its contract.
Workflow Lifecycle Notes¶
The workflow uses two jobs:
prepare-delegated-runnerphase1-delegated-gates
The split is intentional. GitHub downloads uses: actions during "Set up job", before shell steps run. Deleting _actions inside the same job that uses actions/checkout can remove action repositories after GitHub has prepared them. The prepare job is shell-only, so it may safely wipe stale Actions caches before the gates job starts.
Do not merge the prepare and gates jobs unless the checkout path is redesigned.
Cleanup Model¶
The workflow performs destructive cleanup on a dedicated runner:
- kills leftover
assayprocesses - clears immutable bits on known test paths
- removes workspace contents before checkout
- removes
/tmp/assay-runner-*,/dev/shm/assay-runner-*, and/tmp/assay-test - prunes Docker images and volumes when Docker is available
- removes and recreates stale GitHub Actions
_actionscaches in the prepare job - restores workspace ownership to the runner user
The workspace emptiness check is a hard precondition. If it fails, inspect the listed files before rerunning; the failure usually means a permission, immutable-bit, mount, or stale root-owned state problem.
Failure Triage¶
Job waits for runner¶
Symptom:
Check:
- runner service is online
- runner has all three labels:
self-hosted,linux,assay-bpf-runner - no other delegated run is occupying the runner
- workflow concurrency is not waiting behind an in-flight run on the same ref
Checkout or action setup fails¶
Likely causes:
- root-owned
_actionscache survived cleanup - prepare job failed or did not run on the expected work directory
- runner filesystem has immutable files or stale mounts
Actions:
- inspect
prepare-delegated-runnerlogs first - ensure
_actionswas removed before the gates job - keep the two-job lifecycle split
Preflight fails¶
Use the exact error:
| Error | Likely fix |
|---|---|
| required command missing | install the named command on the runner |
| Node.js version below 22 | upgrade the runner Node install |
missing /sys/fs/cgroup/cgroup.controllers | boot or configure the host with cgroup v2 |
eBPF build fails¶
Likely causes:
- Docker unavailable or misconfigured
cargo xtask build-imagefails after Docker prune- missing LLVM/BPF toolchain in non-Docker path
Actions:
- rerun with
build_ebpf=true - inspect
Build eBPF artifact - check whether Docker path or native path was used
- confirm
target/assay-ebpf.oexists before the gate step
BPF verifier rejects program load¶
Symptom:
Actions:
- capture the full verifier output from the job log
- check recent changes under
crates/assay-ebpf/ - compare runner kernel version from the job summary
- rebuild the eBPF artifact on the delegated host
- prefer a small verifier-output-driven fix over changing acceptance criteria
Cgroup placement fails with Operation not supported¶
Likely cause:
- runner service cgroup is
domain threaded; child cgroups below it can bedomain invalid
Expected protection:
- runner sessions resolve from the nearest cgroup v2 domain root before creating the measured cgroup
Actions:
- inspect
/sys/fs/cgroup/.../cgroup.type - confirm the domain-root resolution path is still covered by tests
Cgroup correlation is not clean¶
Symptom:
Likely causes:
- runner child placement happened outside the measured cgroup
- the delegated cgroup subtree changed under systemd
- the host is missing expected cgroup v2 controllers or delegation
- a kernel/cgroup behavior difference changed process placement semantics
Actions:
- check the runner kernel version from the job summary
- inspect
/sys/fs/cgroup/cgroup.controllers - inspect the delegated subtree that the runner user actually owns
- confirm the runner session is still created below a valid cgroup v2 domain root
- prefer rerunning the narrow failing gate after cleanup before changing attribution logic
Ring-buffer drops¶
Symptom:
or:
Meaning:
- kernel observation was incomplete for that run
- Phase 1 delegated gates must fail
Actions:
- inspect observation-health diagnostics and top filtered values
- check whether a recent normalizer/eBPF change increased event volume
- verify the monitor listener still keeps persistent ring-buffer consumers
- rerun the narrow gate once after a clean runner preparation
Do not change the gate to accept drops for Phase 1.
Determinism drift¶
Symptom:
FAIL: observation-health.json changed
FAIL: capability-surface.json changed
FAIL: correlation-report.json changed
Actions:
- read the printed unified diff
- compare drift fields before changing filters
- distinguish telemetry drift from attribution evidence drift
- keep the acceptance discipline strict; make failures more diagnostic rather than weaker
Common prior root causes:
- loader/runtime paths entering kernel evidence
- cold/warm fixture setup differences
- ring-buffer consumer replay
- SDK or policy logs written inside the measured work directory
OpenAI Agents fixture fails¶
Likely causes:
- Node below 22
npm cidid not install fixture dependencies- installed
@openai/agentsmetadata does not match the expected version - SDK hook behavior changed after dependency upgrade
Actions:
- inspect
Install OpenAI Agents fixture deps - run
node --check runner-fixtures/openai-agents/fixture-agent.js - validate the installed package metadata
- for dependency bumps, update the expected version only after the delegated
openai-agents-kernel-policygate passes
For @openai/agents version drift, check both sources:
runner-fixtures/openai-agents/package.json- the acceptance wrapper's expected SDK version environment, when set
The fixture emits SDK metadata from the installed package. The acceptance gate should fail if the installed package version and expected version diverge; do not paper over that failure without rerunning the delegated openai-agents-kernel-policy gate.
Expected PASS Lines¶
A successful gates=all run should include:
PASS: runner-spike kernel-only acceptance
PASS: runner-spike kernel-only three-run determinism
PASS: runner-spike kernel+policy acceptance
PASS: runner-spike kernel+policy three-run determinism
PASS: runner-spike OpenAI Agents kernel+policy acceptance
PASS: runner-spike OpenAI Agents kernel+policy three-run determinism
Each acceptance line appears once per run inside its three-run wrapper. The determinism line appears once per gate.
Result Recording¶
For Phase 1 or runner-impacting regression evidence, record:
- workflow run URL
- job id if relevant
- commit SHA
- selected gate input
- pass/fail status
- PASS lines or the first failing diagnostic block
- whether
build_ebpfwas enabled
If the run is used as an acceptance anchor, also record the bounded claims and non-claims in a note under docs/notes/.
Boundaries¶
This delegated lane does not prove:
- macOS or Windows attribution
- live LLM calls
- arbitrary SDK compatibility
- production load or sustained runner-fleet behavior
- causal ordering between individual SDK, policy, and syscall events
It proves the bounded deterministic Linux/eBPF fixture path described by the Assay-Runner Phase 1 spike and acceptance notes.
References¶
- Phase 1 acceptance:
docs/notes/ASSAY-RUNNER-PHASE1-ACCEPTANCE-2026-05-21.md - Phase 1 spike plan:
docs/notes/ASSAY-RUNNER-PHASE1-SPIKE-PLAN-2026-05-20.md - Delegated workflow:
.github/workflows/runner-spike-delegated.yml