Assay-Runner Capability-Diff v0 Contract¶
Internal Phase 2B contract. This page defines the first Assay-Runner capability-diff projection over normalized runner artifacts. It is not a runner-emitted archive artifact, not a CLI surface, and not a product release contract.
The v0 capability diff answers one narrow question:
The diff is descriptive. It reports added, removed, and unchanged normalized capability values. It must not decide whether a change is acceptable; that remains policy, reviewer, or Harness responsibility.
Inputs¶
A v0 diff compares two evidence sets, named base and head. Each set must provide these normalized artifacts:
| Artifact | Role |
|---|---|
observation-health.json | Determines whether the evidence set is clean enough for a clean diff |
capability-surface.json | Provides the observed capability sets to compare |
correlation-report.json | Provides stable binding identity through tool_call_id |
The input artifacts retain their own schema contracts:
Raw kernel telemetry, workflow logs, proof-pack metadata, and normalized layer streams are diagnostic context only. They are not primary v0 diff inputs.
The v0 capability diff is a pure projection over normalized evidence. Workflow run URLs, commit SHAs, and generation timestamps are intentionally not part of this schema. Consumers that need forensic anchoring should pair the diff with a proof-pack manifest, which carries workflow context separately.
Contract Principles¶
- Normalized evidence only. The diff consumes normalized artifacts after the runner normalizer has already drawn the evidence boundary.
- Surface-level projection. v0 compares set-like values in
capability-surface.json. It does not attribute each filesystem path, process, endpoint, or tool value to an individual binding window. - Stable binding identity. Clean v0 diffs require stable
tool_call_idvalues incorrelation-report.json. - Health remains strict.
ringbuf_drops > 0, incomplete cgroup correlation, or missing SDK/policy layers must not be softened into a clean diff. - Deterministic serialization. Arrays are stable sorted sets unless this contract explicitly says otherwise.
- No acceptability judgment. The diff says what changed, not whether the change is allowed for the project.
Schema¶
Schema string:
Fields:
| Field | Type | Required | Semantics |
|---|---|---|---|
schema | string | yes | Must equal assay.runner.capability_diff.v0 |
base_run_id | string | yes | run_id from the base evidence set |
head_run_id | string | yes | run_id from the head evidence set |
status | enum | yes | clean, partial:health, partial:correlation, partial:unbound, or failed |
preconditions | object | yes | Machine-readable checks that determine whether the diff can be clean |
scope | object | yes | Declares what evidence domain this diff used |
surface | object | yes | Added, removed, and unchanged capability-surface values by category |
binding_ids | object | yes | Added, removed, and unchanged tool-call binding ids |
policy_outcomes | object | yes | Policy decision changes for stable binding ids |
unbound | object | yes | Evidence buckets that could not be safely compared in v0 |
ambiguities | array[string] | yes | Stable code-prefixed ambiguity strings |
notes | array[string] | yes | Stable code-prefixed human-readable notes |
Preconditions¶
preconditions records why a diff is or is not clean.
| Field | Type | Required | Clean value |
|---|---|---|---|
base_health_clean | boolean | yes | true |
head_health_clean | boolean | yes | true |
base_correlation_clean | boolean | yes | true |
head_correlation_clean | boolean | yes | true |
stable_tool_call_ids_required | boolean | yes | true |
stable_tool_call_ids_present | boolean | yes | true |
A health set is clean only when:
kernel_layer=completeringbuf_drops=0policy_layer=presentsdk_layer=self_reportedcgroup_correlation=clean
sdk_layer=present is reserved for a future corroborated SDK path in the artifact contract. This first capability-diff contract accepts only the currently proven S5 fixture shape: sdk_layer=self_reported.
Scope¶
scope separates preconditions from projection scope.
| Field | Type | Required | v0 value |
|---|---|---|---|
projection | string | yes | surface_set |
uses_raw_telemetry | boolean | yes | false |
uses_proof_pack | boolean | yes | false |
per_binding_capability_values | boolean | yes | false |
per_binding_capability_values=false is load-bearing. v0 correlation proves a stable binding id and kernel-event window, but the current capability-surface artifact is global to the run. Therefore v0 must not claim that an individual path, process, endpoint, or tool value belongs to one specific binding.
Surface Diff¶
surface contains one object per capability-surface.v0 category:
filesystem_pathsnetwork_endpointsprocess_execsmcp_toolspolicy_decisions
Each category object has the same fields:
| Field | Type | Required | Semantics |
|---|---|---|---|
added | array[string] | yes | Values present in head and absent from base |
removed | array[string] | yes | Values present in base and absent from head |
unchanged | array[string] | yes | Values present in both base and head |
All arrays serialize in stable lexicographic order.
Policy Decision Consistency¶
surface.policy_decisions reports the changed set of policy decision summaries regardless of binding identity. policy_outcomes.changed reports changed coarse policy outcomes per stable binding id. These views can diverge only when bindings are added or removed.
For unchanged binding ids, the two views must stay consistent. If an unchanged tool_call_id in binding_ids.unchanged has a policy summary change that appears in surface.policy_decisions.added or surface.policy_decisions.removed, that same binding must appear in policy_outcomes.changed. Implementations must verify this consistency before emitting status=clean.
Binding Id Diff¶
binding_ids compares the set of tool_call_id values from clean correlation bindings.
| Field | Type | Required | Semantics |
|---|---|---|---|
added | array[string] | yes | Binding ids present in head and absent from base |
removed | array[string] | yes | Binding ids present in base and absent from head |
unchanged | array[string] | yes | Binding ids present in both base and head |
binding_ids.unchanged reports identity stability only. Policy outcome changes for stable binding ids are tracked separately in policy_outcomes.changed.
Clean v0 does not support order-based fallback. If a binding lacks a stable tool_call_id, the diff is at least partial:correlation.
Policy Outcomes¶
policy_outcomes.changed records changed coarse policy outcomes for unchanged binding ids.
Each entry has:
| Field | Type | Required | Semantics |
|---|---|---|---|
tool_call_id | string | yes | Stable binding id whose policy outcome changed |
base | string or null | yes | Base coarse policy outcome |
head | string or null | yes | Head coarse policy outcome |
Entries serialize by tool_call_id. v0 accepted fixtures use coarse outcomes such as allow and deny; capability-surface policy summaries remain strings such as allow:read_file.
Unbound Evidence¶
unbound uses the same category names as surface, each as a stable array[string].
For a clean v0 diff, every unbound category must be empty. A future input may make per-value unbound evidence explicit. Until then, v0 producers must not invent per-binding path attribution from global capability-surface values. Because all current capability-surface.v0 values are run-global, partial:unbound is reserved for a future per-binding capability artifact. v0 implementations must keep unbound arrays empty; inputs that suggest per-value unbinding without an explicit versioned source should produce status=failed, not an invented partial:unbound projection.
Status Semantics¶
| Status | Semantics |
|---|---|
clean | All preconditions are true, all required artifacts validate, correlation is clean for both sides, and all unbound arrays are empty |
partial:health | At least one evidence set can be parsed but has incomplete health such as ring-buffer drops or incomplete cgroup correlation |
partial:correlation | Health is sufficient to parse, but at least one correlation report is partial, ambiguous, or lacks stable binding identity |
partial:unbound | Reserved for a future per-binding capability artifact; v0 producers must not emit this status from run-global capability-surface values |
failed | Required artifacts are missing, schema strings are unsupported, run ids are internally inconsistent, or deterministic parsing fails |
partial:* diffs may be useful for triage, but they are not clean evidence for acceptance. ringbuf_drops > 0 always prevents status=clean.
Idempotence¶
The first read-only validation gate is idempotence:
This must produce:
status=clean- empty
addedandremovedarrays for every surface category unchangedarrays equal to the input capability-surface sets- empty
binding_ids.addedandbinding_ids.removed binding_ids.unchanged=["tc_runner_policy_001"]- empty
policy_outcomes.changed - empty
unboundarrays - empty
ambiguities
The golden shape for this case is golden/capability-diff-s5-idempotent-v0.json. The read-only validation and reference projection entry point is scripts/ci/assay_runner_capability_diff_validate.py.
Without explicit inputs, the helper projects the diff from the existing S5 golden artifacts and compares it to that frozen output shape. With explicit --base-dir and --head-dir inputs, or with explicit base/head artifact paths, it emits a v0 capability diff over normalized evidence only. Directory inputs must contain observation-health.json, capability-surface.json, and correlation-report.json.
Example:
python3 scripts/ci/assay_runner_capability_diff_validate.py \
--base-dir /path/to/base-normalized-evidence \
--head-dir /path/to/head-normalized-evidence \
--output /tmp/capability-diff.json
Notes Vocabulary¶
notes follows the same code-prefixed convention as the runner artifact contracts. v0 reserves the capability_diff_ prefix. The first golden shape emits capability_diff_idempotent when base and head evidence sets are identical. Implementations must not introduce new note codes without updating this contract.
Non-Goals¶
v0 does not include:
- declared-capability input; the future declared-capability contract is a separate Phase 2C+ slice
- per-binding path/process/endpoint attribution
- call-id-less order fallback
- second runtime support; cross-runtime semantics are contracted separately in
cross-runtime-diff-v0.md - macOS runner support
- raw telemetry diffing
- proof-pack ingestion as a required input
- OTel or GenAI semantic-convention mapping
- acceptability or policy judgment
Implementation Placement¶
The boundary map places capability-diff projection semantics on the Trust Basis / Harness side while Runner delivers clean measured-run input bundles. A future implementation may start as an Assay-side reference checker, but it must not silently move artifact meaning into the runner candidate.