Runner Artifact v0 Contracts¶

Internal Phase 2A reference. These contracts describe the normalized runner-spike artifacts proven by the delegated Linux/eBPF Phase 1 acceptance run. They are not a public Assay-Runner release contract.

The Phase 1 delegated proof writes a deterministic archive containing three load-bearing JSON artifacts:

observation-health.json
capability-surface.json
correlation-report.json

The determinism claim is over these normalized artifacts and the normalized layer streams in the archive. It is not a claim that raw kernel telemetry, raw ring-buffer delivery, dynamic loader behavior, or the eBPF object are byte-identical across runs.

Machine-readable golden-shape examples are listed in golden/index.md. They are not substitutes for delegated acceptance runs; they freeze the v0 field set and serialization shape that docs and tests should preserve. Example values are illustrative unless this contract explicitly defines a field's allowed value vocabulary.

Contract Principles¶

Monitor observes broadly. Kernel, policy, and SDK monitors may observe more than the runner-spike bundle is willing to claim as evidence.
Normalizer claims narrowly. The runner-spike normalizer keeps only events that support the measured-run attribution claim.
Passing health stays strict. A passing Linux/eBPF delegated run requires complete kernel capture, zero ring-buffer drops, and clean cgroup correlation.
v0 artifacts are deterministic. Set-like fields serialize in stable order; three-run delegated determinism compares the normalized JSON artifacts byte-for-byte.
Unsupported expansion is a contract change. New artifact fields, schema strings, layer status values, or evidence categories require an explicit Phase 2 contract review.

Phase 1 Proof-Pack Manifest¶

The assay.runner.phase1_proof_pack.v0 manifest is a hand-curated retention record for the Phase 1 delegated acceptance run. It is not emitted by the runner and is not part of the runner archive schema. Its purpose is to retain workflow metadata, logs, excerpts, and digest links for the historical delegated proof after GitHub Actions log retention changes.

Telemetry Versus Evidence¶

The monitor layer may observe telemetry that is useful for debugging but not for the runner-spike capability claim. The normalizer excludes this telemetry from capability-surface.json and layers/kernel.ndjson.

Filter level	Examples	Reason
Event type	`EVENT_INODE_RESOLVED`	Kernel-internal resolution telemetry; it does not add an agent capability claim
Dynamic loader paths	`/etc/ld.so.cache`, `/lib/`, `/lib32/`, `/lib64/`, `/usr/lib/`	Runtime loader behavior rather than agent-selected behavior
Locale/runtime paths	`/usr/share/locale/*`, `/etc/localtime`, Python bootstrap probes	libc/interpreter initialization noise
Toolchain/RPATH paths	`.rustup/toolchains//.so`, `target//{build,debug,release}/.so`	Cargo/Rust loader behavior
Runtime dependency trees	`node_modules/*`	SDK/runtime implementation detail, not a runner capability claim
Kernel and device introspection	`/proc/`, `/sys/`, `/dev/*`	Runtime plumbing and host introspection

Policy-denied paths remain evidence. For example, a file_blocked policy event for /lib/.../libc.so.6 is preserved because the policy decision is the claim, even though ordinary loader openat telemetry for that path is filtered.

`layers/kernel.ndjson`¶

Schema string:

assay.runner.kernel_event.v0

Machine-readable line schema:

schema/kernel-event-v0.schema.json

layers/kernel.ndjson is an NDJSON stream: each non-empty line validates independently against the line schema. The schema covers the enriched open metadata shape now emitted by Runner while preserving compatibility with older v0 archives where those open metadata fields are absent.

Each line is one normalized kernel event. Common fields:

Field	Type	Required	Semantics
`schema`	string	yes	Must equal `assay.runner.kernel_event.v0`
`run_id`	string	yes	Run identifier shared by all archive artifacts
`seq`	integer	yes	Runner-assigned sequence in normalized kernel layer order
`pid`	integer	yes	Kernel-observed thread-group id for the event
`event_type`	integer	yes	Internal monitor event id (`1` for `openat`, `2` for `connect`, etc.)
`kind`	string	yes	Normalized event kind: `openat`, `connect`, `exec`, `file_blocked`, `connect_blocked`, or reserved `event_<id>` for unrecognized monitor ids
`value`	string or null	yes	Normalized path or endpoint when available

Known v0 event_type / kind pairs:

`event_type`	`kind`
`1`	`openat`
`2`	`connect`
`4`	`exec`
`10`	`file_blocked`
`20`	`connect_blocked`

Other internal monitor ids remain valid only as kind=event_<id>. value=null means the monitor event was preserved but the normalizer could not decode a path or endpoint from it.

Open events may additionally carry syscall metadata:

Field	Type	Required	Semantics
`flags`	integer	no	Linux open flags captured at syscall entry
`mode`	integer	no	Linux create mode argument, when provided
`resolve`	integer	no	`openat2` resolve flags, when non-zero
`return_value`	integer	no	Syscall return value from `sys_exit_openat*`
`access_mode`	enum	no	`read`, `write`, `read_write`, or `unknown`, derived from `flags & O_ACCMODE`
`operation_flags`	array[string]	no	Derived operation hints such as `create`, `truncate`, `append`, `exclusive`
`status`	enum	no	`success` when return value is non-negative, otherwise `error`

These fields are optional so older v0 archives remain readable. Consumers that need read/write/create/remove distinctions must require the optional open metadata and treat older archives as inconclusive for that dimension.

Open metadata example:

{
  "schema": "assay.runner.kernel_event.v0",
  "run_id": "run_001",
  "seq": 0,
  "pid": 1234,
  "event_type": 1,
  "kind": "openat",
  "value": "/tmp/work/fixture-output.txt",
  "flags": 577,
  "mode": 420,
  "return_value": 4,
  "access_mode": "write",
  "operation_flags": ["create", "truncate"],
  "status": "success"
}

Undecoded event example:

{
  "schema": "assay.runner.kernel_event.v0",
  "run_id": "run_001",
  "seq": 1,
  "pid": 1234,
  "event_type": 999,
  "kind": "event_999",
  "value": null
}

`observation-health.json`¶

Schema string:

assay.runner.observation_health.v0

Fields:

Field	Type	Required	Semantics
`schema`	string	yes	Must equal `assay.runner.observation_health.v0`
`run_id`	string	yes	Non-empty run identifier shared by all archive artifacts
`platform`	string	yes	Platform name used by the runner-spike path; Phase 1 accepted `linux`
`kernel_layer`	enum	yes	`complete`, `partial_ringbuf_drops`, or `absent`
`ringbuf_drops`	integer	yes	Total ring-buffer drops for the measured kernel capture window
`policy_layer`	enum	yes	`present` or `absent`
`sdk_layer`	enum	yes	`present`, `self_reported`, or `absent`
`cgroup_correlation`	enum	yes	`clean`, `partial`, or `failed`
`network_protocol_coverage`	enum	yes	Network protocol coverage scope for this run's kernel evidence
`network_endpoint_claim_scope`	enum	yes	Claim boundary for interpreting `network_endpoints`
`notes`	array[string]	yes	Stable code-prefixed capture notes used by delegated determinism

sdk_layer=self_reported is the accepted v0 value for the OpenAI Agents SDK fixture because SDK events are emitted by the fixture/runtime path itself. sdk_layer=present remains in the v0 enum vocabulary for a future directly corroborated SDK layer, but the accepted S5 fixture does not use it.

notes are strings in v0, but they are not arbitrary prose. The token before the first colon is the machine-readable note code, for example s2_kernel_capture, s4_policy_capture, or s5_sdk_capture. Text after the colon is a stable deterministic message for reviewers and diffs. A future v1 may split these into {code, message} objects, but v0 consumers should parse the code prefix when they need machine-readable note identity.

Validation rules:

run_id must not be empty.
ringbuf_drops > 0 requires kernel_layer=partial_ringbuf_drops.
non-linux platforms require kernel_layer=absent.
cgroup_correlation=failed is not valid for a passing Phase 1 run.

Interpretation rules:

kernel_layer=complete means capture health was clean for the attached hooks; it does not, by itself, prove protocol-complete network coverage.
network_protocol_coverage=absent means Runner observed no network protocol events in the capture window. It is not a positive network coverage claim.
network_protocol_coverage=unknown means network protocol coverage cannot be interpreted, for example because relevant network hook events may have been dropped.
network_protocol_coverage=connect_only means Runner observed connect()-level network evidence only.
network_protocol_coverage=datagram_peer_observed means Runner observed datagram peer evidence from sendto or sendmsg, without a matching connect() event in the same capture window.
network_protocol_coverage=connect_and_datagram_peer_observed means Runner observed both connect() and datagram peer evidence in the same capture window.
network_endpoint_claim_scope=diagnostic_only means network_endpoints are useful for coarse/diagnostic review, not for exact peer-set claims on datagram protocols such as QUIC.
network_endpoint_claim_scope=not_applicable means no network endpoint claim is available for this run.
network_endpoint_claim_scope=unknown means no bounded endpoint claim should be inferred because network coverage is not interpretable.
Datagram peer evidence is stronger than connect_only for QUIC-style capture, but v0 still keeps network_endpoint_claim_scope=diagnostic_only unless a future layer can make a bounded exact peer-set claim.

Passing Linux/eBPF example:

{
  "schema": "assay.runner.observation_health.v0",
  "run_id": "run_openai_agents_kernel_policy_determinism",
  "platform": "linux",
  "kernel_layer": "complete",
  "ringbuf_drops": 0,
  "policy_layer": "present",
  "sdk_layer": "self_reported",
  "cgroup_correlation": "clean",
  "network_protocol_coverage": "connect_only",
  "network_endpoint_claim_scope": "diagnostic_only",
  "notes": [
    "s2_kernel_capture: monitor_events=4 ringbuf_drops=0 network_protocol_coverage=connect_only network_endpoint_claim_scope=diagnostic_only",
    "s4_policy_capture: policy_events=1",
    "s5_sdk_capture: sdk_events=3 sdk_tool_calls=1"
  ]
}

`capability-surface.json`¶

Schema string:

assay.runner.capability_surface.v0

Fields:

Field	Type	Required	Semantics
`schema`	string	yes	Must equal `assay.runner.capability_surface.v0`
`run_id`	string	yes	Non-empty run identifier shared by all archive artifacts
`filesystem_paths`	array[string]	yes	Stable sorted set of observed filesystem evidence values
`network_endpoints`	array[string]	yes	Stable sorted set of observed network endpoint values under the coverage and claim-scope limits declared in `observation_health`
`process_execs`	array[string]	yes	Stable sorted set of observed process execution values
`mcp_tools`	array[string]	yes	Stable sorted set of observed MCP/tool names
`policy_decisions`	array[string]	yes	Stable sorted set of policy decision summaries

filesystem_paths stores full observed v0 path values, not directory-prefix projections. During Phase 2A consolidation this field was renamed from the earlier internal filesystem_prefixes label because the old name implied a projection the artifact never provided. Prefix projection remains a later capability-diff transformation, not part of this artifact contract.

For current Linux kernel capture, network_endpoints is populated from EVENT_CONNECT / EVENT_CONNECT_BLOCKED sockaddr values. For datagram protocols, especially QUIC, this is a connect-attempt surface, not a proven actual peer-set surface.

Example:

{
  "schema": "assay.runner.capability_surface.v0",
  "run_id": "run_kernel_policy_determinism",
  "filesystem_paths": [
    "/tmp/assay-runner-kernel-policy/work/input.txt",
    "/tmp/assay-runner-kernel-policy/work/output.txt"
  ],
  "network_endpoints": [],
  "process_execs": [
    "/usr/bin/cat"
  ],
  "mcp_tools": [
    "read_file"
  ],
  "policy_decisions": [
    "allow:read_file"
  ]
}

`correlation-report.json`¶

Schema string:

assay.runner.correlation_report.v0

Fields:

Field	Type	Required	Semantics
`schema`	string	yes	Must equal `assay.runner.correlation_report.v0`
`run_id`	string	yes	Non-empty run identifier shared by all archive artifacts
`status`	enum	yes	`clean`, `partial`, or `failed`
`bindings`	array[object]	yes	Stable correlation bindings from SDK/policy windows to kernel evidence
`ambiguities`	array[string]	yes	Stable list of unresolved correlation issues

Binding fields:

Field	Type	Required	Semantics
`tool_call_id`	string	yes	Tool-call id used to bind SDK and policy layers
`policy_decision`	string or null	yes	Matched coarse policy outcome, when present (`allow` or `deny` in v0 accepted fixtures)
`kernel_event_count`	integer	yes	Count of normalized kernel events in the binding window
`window.start`	string	yes	Inclusive window start marker
`window.end`	string	yes	Inclusive window end marker

Correlation windows use runner-defined phase markers from one canonical runner clock. SDK-provided timestamps are informational only and MUST NOT be used as primary join keys for v0 correlation. They also MUST NOT be used as an ordering fallback to disambiguate call-id-less tool bindings.

Clean v0 correlation requires a stable tool_call_id for every tool-call binding. The first Phase 2B capability-diff contract inherits this rule: it may diff clean bindings by tool_call_id, but it must not introduce deterministic order-fallback for call-id-less agent runtimes. If a runtime cannot provide a stable tool-call id, the runner must report a partial or failed correlation state until a separate call-id-less fixture and contract explicitly define the fallback semantics.

Example:

{
  "schema": "assay.runner.correlation_report.v0",
  "run_id": "run_openai_agents_kernel_policy_determinism",
  "status": "clean",
  "bindings": [
    {
      "tool_call_id": "tc_runner_policy_001",
      "policy_decision": "allow",
      "kernel_event_count": 2,
      "window": {
        "start": "run_started",
        "end": "run_finished"
      }
    }
  ],
  "ambiguities": []
}

Archive Relationship¶

The archive manifest is deterministic and uses:

assay.runner.archive_manifest.v0

The manifest records each archive path, byte length, and sha256: digest. The three v0 JSON artifacts must all share the archive run_id.

Dependency Upgrade Flow¶

The OpenAI Agents acceptance fixture pins and asserts the SDK version used by the delegated gate. For dependency bumps:

Update the fixture dependency.
Run the SDK fixture locally enough to verify package metadata loading.
Update the expected SDK version in the acceptance wrapper only in the same change that updates the dependency.
Dispatch Runner Spike Delegated with gates=openai-agents-kernel-policy and build_ebpf=true.
Merge only after the delegated gate proves SDK, policy, and kernel correlation remains byte-stable over three runs.

Version bumps must not silently relax schema validation or determinism assertions.

Runner Artifact v0 Contracts¶

Contract Principles¶

Phase 1 Proof-Pack Manifest¶

Telemetry Versus Evidence¶

layers/kernel.ndjson¶

observation-health.json¶

capability-surface.json¶

correlation-report.json¶

Archive Relationship¶

Dependency Upgrade Flow¶

`layers/kernel.ndjson`¶

`observation-health.json`¶

`capability-surface.json`¶

`correlation-report.json`¶