Runner Artifact v0 Contracts¶
Internal Phase 2A reference. These contracts describe the normalized runner-spike artifacts proven by the delegated Linux/eBPF Phase 1 acceptance run. They are not a public Assay-Runner release contract.
The Phase 1 delegated proof writes a deterministic archive containing three load-bearing JSON artifacts:
observation-health.jsoncapability-surface.jsoncorrelation-report.json
The determinism claim is over these normalized artifacts and the normalized layer streams in the archive. It is not a claim that raw kernel telemetry, raw ring-buffer delivery, dynamic loader behavior, or the eBPF object are byte-identical across runs.
Machine-readable golden-shape examples are listed in golden/index.md. They are not substitutes for delegated acceptance runs; they freeze the v0 field set and serialization shape that docs and tests should preserve. Example values are illustrative unless this contract explicitly defines a field's allowed value vocabulary.
Contract Principles¶
- Monitor observes broadly. Kernel, policy, and SDK monitors may observe more than the runner-spike bundle is willing to claim as evidence.
- Normalizer claims narrowly. The runner-spike normalizer keeps only events that support the measured-run attribution claim.
- Passing health stays strict. A passing Linux/eBPF delegated run requires complete kernel capture, zero ring-buffer drops, and clean cgroup correlation.
- v0 artifacts are deterministic. Set-like fields serialize in stable order; three-run delegated determinism compares the normalized JSON artifacts byte-for-byte.
- Unsupported expansion is a contract change. New artifact fields, schema strings, layer status values, or evidence categories require an explicit Phase 2 contract review.
Phase 1 Proof-Pack Manifest¶
The assay.runner.phase1_proof_pack.v0 manifest is a hand-curated retention record for the Phase 1 delegated acceptance run. It is not emitted by the runner and is not part of the runner archive schema. Its purpose is to retain workflow metadata, logs, excerpts, and digest links for the historical delegated proof after GitHub Actions log retention changes.
Telemetry Versus Evidence¶
The monitor layer may observe telemetry that is useful for debugging but not for the runner-spike capability claim. The normalizer excludes this telemetry from capability-surface.json and layers/kernel.ndjson.
| Filter level | Examples | Reason |
|---|---|---|
| Event type | EVENT_INODE_RESOLVED | Kernel-internal resolution telemetry; it does not add an agent capability claim |
| Dynamic loader paths | /etc/ld.so.cache, /lib/*, /lib32/*, /lib64/*, /usr/lib/* | Runtime loader behavior rather than agent-selected behavior |
| Locale/runtime paths | /usr/share/locale/*, /etc/localtime, Python bootstrap probes | libc/interpreter initialization noise |
| Toolchain/RPATH paths | .rustup/toolchains/*/*.so*, target/*/{build,debug,release}/*.so* | Cargo/Rust loader behavior |
| Runtime dependency trees | node_modules/* | SDK/runtime implementation detail, not a runner capability claim |
| Kernel and device introspection | /proc/*, /sys/*, /dev/* | Runtime plumbing and host introspection |
Policy-denied paths remain evidence. For example, a file_blocked policy event for /lib/.../libc.so.6 is preserved because the policy decision is the claim, even though ordinary loader openat telemetry for that path is filtered.
layers/kernel.ndjson¶
Schema string:
Machine-readable line schema:
schema/kernel-event-v0.schema.json
layers/kernel.ndjson is an NDJSON stream: each non-empty line validates independently against the line schema. The schema covers the enriched open metadata shape now emitted by Runner while preserving compatibility with older v0 archives where those open metadata fields are absent.
Each line is one normalized kernel event. Common fields:
| Field | Type | Required | Semantics |
|---|---|---|---|
schema | string | yes | Must equal assay.runner.kernel_event.v0 |
run_id | string | yes | Run identifier shared by all archive artifacts |
seq | integer | yes | Runner-assigned sequence in normalized kernel layer order |
pid | integer | yes | Kernel-observed thread-group id for the event |
event_type | integer | yes | Internal monitor event id (1 for openat, 2 for connect, etc.) |
kind | string | yes | Normalized event kind: openat, connect, exec, file_blocked, connect_blocked, or reserved event_<id> for unrecognized monitor ids |
value | string or null | yes | Normalized path or endpoint when available |
Known v0 event_type / kind pairs:
event_type | kind |
|---|---|
1 | openat |
2 | connect |
4 | exec |
10 | file_blocked |
20 | connect_blocked |
Other internal monitor ids remain valid only as kind=event_<id>. value=null means the monitor event was preserved but the normalizer could not decode a path or endpoint from it.
Open events may additionally carry syscall metadata:
| Field | Type | Required | Semantics |
|---|---|---|---|
flags | integer | no | Linux open flags captured at syscall entry |
mode | integer | no | Linux create mode argument, when provided |
resolve | integer | no | openat2 resolve flags, when non-zero |
return_value | integer | no | Syscall return value from sys_exit_openat* |
access_mode | enum | no | read, write, read_write, or unknown, derived from flags & O_ACCMODE |
operation_flags | array[string] | no | Derived operation hints such as create, truncate, append, exclusive |
status | enum | no | success when return value is non-negative, otherwise error |
These fields are optional so older v0 archives remain readable. Consumers that need read/write/create/remove distinctions must require the optional open metadata and treat older archives as inconclusive for that dimension.
Open metadata example:
{
"schema": "assay.runner.kernel_event.v0",
"run_id": "run_001",
"seq": 0,
"pid": 1234,
"event_type": 1,
"kind": "openat",
"value": "/tmp/work/fixture-output.txt",
"flags": 577,
"mode": 420,
"return_value": 4,
"access_mode": "write",
"operation_flags": ["create", "truncate"],
"status": "success"
}
Undecoded event example:
{
"schema": "assay.runner.kernel_event.v0",
"run_id": "run_001",
"seq": 1,
"pid": 1234,
"event_type": 999,
"kind": "event_999",
"value": null
}
observation-health.json¶
Schema string:
Fields:
| Field | Type | Required | Semantics |
|---|---|---|---|
schema | string | yes | Must equal assay.runner.observation_health.v0 |
run_id | string | yes | Non-empty run identifier shared by all archive artifacts |
platform | string | yes | Platform name used by the runner-spike path; Phase 1 accepted linux |
kernel_layer | enum | yes | complete, partial_ringbuf_drops, or absent |
ringbuf_drops | integer | yes | Total ring-buffer drops for the measured kernel capture window |
policy_layer | enum | yes | present or absent |
sdk_layer | enum | yes | present, self_reported, or absent |
cgroup_correlation | enum | yes | clean, partial, or failed |
network_protocol_coverage | enum | yes | Network protocol coverage scope for this run's kernel evidence |
network_endpoint_claim_scope | enum | yes | Claim boundary for interpreting network_endpoints |
notes | array[string] | yes | Stable code-prefixed capture notes used by delegated determinism |
sdk_layer=self_reported is the accepted v0 value for the OpenAI Agents SDK fixture because SDK events are emitted by the fixture/runtime path itself. sdk_layer=present remains in the v0 enum vocabulary for a future directly corroborated SDK layer, but the accepted S5 fixture does not use it.
notes are strings in v0, but they are not arbitrary prose. The token before the first colon is the machine-readable note code, for example s2_kernel_capture, s4_policy_capture, or s5_sdk_capture. Text after the colon is a stable deterministic message for reviewers and diffs. A future v1 may split these into {code, message} objects, but v0 consumers should parse the code prefix when they need machine-readable note identity.
Validation rules:
run_idmust not be empty.ringbuf_drops > 0requireskernel_layer=partial_ringbuf_drops.- non-
linuxplatforms requirekernel_layer=absent. cgroup_correlation=failedis not valid for a passing Phase 1 run.
Interpretation rules:
kernel_layer=completemeans capture health was clean for the attached hooks; it does not, by itself, prove protocol-complete network coverage.network_protocol_coverage=absentmeans Runner observed no network protocol events in the capture window. It is not a positive network coverage claim.network_protocol_coverage=unknownmeans network protocol coverage cannot be interpreted, for example because relevant network hook events may have been dropped.network_protocol_coverage=connect_onlymeans Runner observedconnect()-level network evidence only.network_protocol_coverage=datagram_peer_observedmeans Runner observed datagram peer evidence fromsendtoorsendmsg, without a matchingconnect()event in the same capture window.network_protocol_coverage=connect_and_datagram_peer_observedmeans Runner observed bothconnect()and datagram peer evidence in the same capture window.network_endpoint_claim_scope=diagnostic_onlymeansnetwork_endpointsare useful for coarse/diagnostic review, not for exact peer-set claims on datagram protocols such as QUIC.network_endpoint_claim_scope=not_applicablemeans no network endpoint claim is available for this run.network_endpoint_claim_scope=unknownmeans no bounded endpoint claim should be inferred because network coverage is not interpretable.- Datagram peer evidence is stronger than
connect_onlyfor QUIC-style capture, but v0 still keepsnetwork_endpoint_claim_scope=diagnostic_onlyunless a future layer can make a bounded exact peer-set claim.
Passing Linux/eBPF example:
{
"schema": "assay.runner.observation_health.v0",
"run_id": "run_openai_agents_kernel_policy_determinism",
"platform": "linux",
"kernel_layer": "complete",
"ringbuf_drops": 0,
"policy_layer": "present",
"sdk_layer": "self_reported",
"cgroup_correlation": "clean",
"network_protocol_coverage": "connect_only",
"network_endpoint_claim_scope": "diagnostic_only",
"notes": [
"s2_kernel_capture: monitor_events=4 ringbuf_drops=0 network_protocol_coverage=connect_only network_endpoint_claim_scope=diagnostic_only",
"s4_policy_capture: policy_events=1",
"s5_sdk_capture: sdk_events=3 sdk_tool_calls=1"
]
}
capability-surface.json¶
Schema string:
Fields:
| Field | Type | Required | Semantics |
|---|---|---|---|
schema | string | yes | Must equal assay.runner.capability_surface.v0 |
run_id | string | yes | Non-empty run identifier shared by all archive artifacts |
filesystem_paths | array[string] | yes | Stable sorted set of observed filesystem evidence values |
network_endpoints | array[string] | yes | Stable sorted set of observed network endpoint values under the coverage and claim-scope limits declared in observation_health |
process_execs | array[string] | yes | Stable sorted set of observed process execution values |
mcp_tools | array[string] | yes | Stable sorted set of observed MCP/tool names |
policy_decisions | array[string] | yes | Stable sorted set of policy decision summaries |
filesystem_paths stores full observed v0 path values, not directory-prefix projections. During Phase 2A consolidation this field was renamed from the earlier internal filesystem_prefixes label because the old name implied a projection the artifact never provided. Prefix projection remains a later capability-diff transformation, not part of this artifact contract.
For current Linux kernel capture, network_endpoints is populated from EVENT_CONNECT / EVENT_CONNECT_BLOCKED sockaddr values. For datagram protocols, especially QUIC, this is a connect-attempt surface, not a proven actual peer-set surface.
Example:
{
"schema": "assay.runner.capability_surface.v0",
"run_id": "run_kernel_policy_determinism",
"filesystem_paths": [
"/tmp/assay-runner-kernel-policy/work/input.txt",
"/tmp/assay-runner-kernel-policy/work/output.txt"
],
"network_endpoints": [],
"process_execs": [
"/usr/bin/cat"
],
"mcp_tools": [
"read_file"
],
"policy_decisions": [
"allow:read_file"
]
}
correlation-report.json¶
Schema string:
Fields:
| Field | Type | Required | Semantics |
|---|---|---|---|
schema | string | yes | Must equal assay.runner.correlation_report.v0 |
run_id | string | yes | Non-empty run identifier shared by all archive artifacts |
status | enum | yes | clean, partial, or failed |
bindings | array[object] | yes | Stable correlation bindings from SDK/policy windows to kernel evidence |
ambiguities | array[string] | yes | Stable list of unresolved correlation issues |
Binding fields:
| Field | Type | Required | Semantics |
|---|---|---|---|
tool_call_id | string | yes | Tool-call id used to bind SDK and policy layers |
policy_decision | string or null | yes | Matched coarse policy outcome, when present (allow or deny in v0 accepted fixtures) |
kernel_event_count | integer | yes | Count of normalized kernel events in the binding window |
window.start | string | yes | Inclusive window start marker |
window.end | string | yes | Inclusive window end marker |
Correlation windows use runner-defined phase markers from one canonical runner clock. SDK-provided timestamps are informational only and MUST NOT be used as primary join keys for v0 correlation. They also MUST NOT be used as an ordering fallback to disambiguate call-id-less tool bindings.
Clean v0 correlation requires a stable tool_call_id for every tool-call binding. The first Phase 2B capability-diff contract inherits this rule: it may diff clean bindings by tool_call_id, but it must not introduce deterministic order-fallback for call-id-less agent runtimes. If a runtime cannot provide a stable tool-call id, the runner must report a partial or failed correlation state until a separate call-id-less fixture and contract explicitly define the fallback semantics.
Example:
{
"schema": "assay.runner.correlation_report.v0",
"run_id": "run_openai_agents_kernel_policy_determinism",
"status": "clean",
"bindings": [
{
"tool_call_id": "tc_runner_policy_001",
"policy_decision": "allow",
"kernel_event_count": 2,
"window": {
"start": "run_started",
"end": "run_finished"
}
}
],
"ambiguities": []
}
Archive Relationship¶
The archive manifest is deterministic and uses:
The manifest records each archive path, byte length, and sha256: digest. The three v0 JSON artifacts must all share the archive run_id.
Dependency Upgrade Flow¶
The OpenAI Agents acceptance fixture pins and asserts the SDK version used by the delegated gate. For dependency bumps:
- Update the fixture dependency.
- Run the SDK fixture locally enough to verify package metadata loading.
- Update the expected SDK version in the acceptance wrapper only in the same change that updates the dependency.
- Dispatch
Runner Spike Delegatedwithgates=openai-agents-kernel-policyandbuild_ebpf=true. - Merge only after the delegated gate proves SDK, policy, and kernel correlation remains byte-stable over three runs.
Version bumps must not silently relax schema validation or determinism assertions.