assay evidence¶
Manage Assay evidence bundles and external evidence imports.
Synopsis¶
Receipt Schema Registry¶
Inspect and validate the machine-readable receipt schema registry:
assay evidence schema list
assay evidence schema show promptfoo.assertion-component.v1
assay evidence schema show promptfoo.assertion-component.v1 --raw
assay evidence schema validate \
--schema promptfoo.assertion-component.v1 \
--input receipt.json
For JSONL importer inputs, validate each non-empty row with --jsonl:
assay evidence schema validate \
--schema promptfoo-cli-jsonl-component-result.v1 \
--input results.jsonl \
--jsonl
The schema CLI covers the current receipt schema registry:
- receipt payload schemas for Promptfoo, OpenFeature, CycloneDX ML-BOM, Mastra, Pydantic, and LiveKit receipts
- importer input schemas where the reduced input artifact differs from the receipt payload
- metadata such as schema
$id, family, status, source path, short description, and Trust Basis claim when one exists
Mastra, Pydantic, and LiveKit remain importer-only in this registry. They have input and receipt schemas, but no public Trust Basis score, case-result, or acted-family receipt claims yet.
validate exits 0 when the artifact matches the selected schema and exits 1 when the artifact is valid JSON/JSONL but fails schema validation. Invalid JSON, invalid JSONL rows, empty JSONL input, unknown schema names, unreadable files, and runtime/configuration errors remain input/config errors (2+).
Options¶
| Command | Description |
|---|---|
assay evidence schema list [--format text|json] | List all supported schema entries |
assay evidence schema show <SCHEMA> [--format text|json] [--raw] | Show schema metadata or raw JSON Schema |
assay evidence schema validate --schema <SCHEMA> --input <PATH> [--jsonl] [--format text|json] | Validate a JSON or JSONL artifact |
Schema names can be the registry name, known alias, source path, or JSON Schema $id. Use list to discover supported names.
MCP Execution Record Pairing¶
Verify that request binding material and server execution-record fixtures pair up from the consumer side:
assay evidence verify-mcp-records \
--attestation sep2787-attestation.json \
--decision server-decision-record.json \
--outcome server-outcome-record.json \
--format json
For deployments without SEP-2787 attestation, supply the observed tools/call params plus _meta request envelope instead:
assay evidence verify-mcp-records \
--request-envelope tools-call-envelope.json \
--decision server-decision-record.json \
--outcome server-outcome-record.json \
--format json
--attestation and --request-envelope are mutually exclusive. Exactly one is required.
This command emits an assay.mcp.execution-record-pairing.report.v0 report. It computes the binding digest, checks the decision and optional outcome backLink fields, verifies the outcome's decisionDigest commitment to the full signed decision record, and verifies the narrow decision/outcome enum surface. In SEP-2787 mode the binding digest is the attestation JCS digest and the nonce comes from issuerAsserted.nonce. In request-envelope mode the binding digest is the JCS digest of the supplied envelope, while the nonce is only checked for decision/outcome record consistency.
The command is deliberately not an MCP proxy, issuer, policy engine, or runtime truth oracle. It does not verify signatures, establish issuer key trust, prove policy correctness, prove side effects, or disclose payload/result bodies. It is for downstream verifier fixtures and reviewer-visible pairing diagnostics. Request-envelope fallback does not prove the server observed that envelope honestly, or that the server-chosen nonce was unique or fresh for the call.
If --outcome is omitted, Assay reports a valid decision-only pairing check. Pairing or enum mismatches produce a report and exit 2.
MCP Tunnel Observed-Facts Check¶
Validate one bounded MCP tunnel observed-facts fixture and classify its join evidence from the consumer side:
assay evidence verify-mcp-tunnel-observed \
--artifact examples/mcp-tunnel-observed-evidence/fixtures/valid.tunnel.json \
--format json
This command emits an assay.mcp.tunnel-observed.report.v0 report. It checks the provider-neutral assay.mcp.tunnel_observed.v0 fixture shape, enforces the no-raw-payload and no-raw-authorization boundaries, and reports whether evidence_refs form a strong same_request_instance join or only diagnostic correlation.
A strong join requires the referenced evidence to bind the same request_envelope_digest and request_envelope_canonicalization as the tunnel artifact. Route, upstream, request id, timestamp, or provider request id alone remain diagnostic correlation. The command deliberately does not prove tunnel mediation, agent identity, authorization success, policy correctness, tool result truth, application outcome truth, or issuer/key trust. Boundary violations produce a JSON/table report and exit 2.
CycloneDX ML-BOM Model Import¶
Import one selected CycloneDX ML-BOM machine-learning-model component into a verifiable Assay evidence bundle:
assay evidence import cyclonedx-mlbom-model \
--input bom.cdx.json \
--bundle-out cyclonedx-model-receipt.tar.gz \
--source-artifact-ref bom.cdx.json
The importer is intentionally strict in v1:
- input must be CycloneDX JSON with
bomFormat = CycloneDX - model components must live in
components[] - the selected component must have
type = machine-learning-model - the selected component must have bounded
bom-refandname - if multiple model components exist,
--bom-refis required - full BOM graphs,
modelCardbodies, dataset bodies, vulnerabilities, licenses, pedigree, metrics, and fairness/ethics sections are excluded
The importer first computes source_artifact_digest over the full BOM file, then reduces the selected model component. Receipts stay small while still binding back to the exact source artifact bytes.
The receipt is an inventory-boundary artifact. It does not mean the model is safe, approved, licensed, compliant, vulnerable or non-vulnerable, fair, or correct. It also does not import full CycloneDX BOM truth into Assay.
The output bundle can be verified with:
The same bundle can feed the Trust Basis compiler:
Trust Basis emits external_inventory_receipt_boundary_visible when the supported CycloneDX ML-BOM model-component receipt shape is present. That claim means the bounded inventory receipt boundary is visible; it does not mean the BOM is complete, the model is safe, the model card is correct, the datasets are approved, or the CycloneDX artifact is imported as Assay truth.
Use --bom-ref <REF> when the BOM has multiple machine-learning-model components. Use --import-time <RFC3339> for deterministic fixture generation.
Options¶
| Option | Description |
|---|---|
--input <PATH> | CycloneDX JSON BOM artifact file |
--bundle-out <PATH> | Output Assay evidence bundle path |
--bom-ref <REF> | Select a machine-learning-model component by bom-ref |
--source-artifact-ref <REF> | Reviewer-safe source artifact reference stored in receipts |
--run-id <ID> | Assay import run id used for receipt provenance and event ids |
--import-time <RFC3339> | Deterministic import timestamp override |
Mastra ScoreEvent Import¶
Import bounded, reviewer-safe Mastra ScoreEvent / ExportedScore-derived score artifacts into a verifiable Assay evidence bundle:
assay evidence import mastra-score-event \
--input mastra-score-events.jsonl \
--bundle-out mastra-score-receipts.tar.gz \
--source-artifact-ref mastra-score-events.jsonl
The importer is intentionally strict in v1:
- input must be JSONL with one reduced score-event artifact per row
- each row must use
mastra.score-event.export.v1 - each row must use
surface = observability.score_event scoremust be numerictarget_refand at least one scorer identity (scorer_idorscorer_name) must be presentscore_id_refmaps to MastrascoreIdwhen present; it is live-backed on@mastra/core1.29.1/@mastra/observability1.10.2, but remains optional in the v1 reduced artifact for older captures and compatibility fixturesscore_source,trace_id_ref,span_id_ref, andscore_trace_id_refare reviewer aids only, not receipt identity in v1- raw exporter callback payloads, raw
metadata, rawcorrelationContext, trace trees, spans, logs, metrics, feedback, prompts, request/response bodies, scorer configs, and dashboard state are excluded
The importer first computes source_artifact_digest over the full JSONL file, then parses and reduces score-event artifacts. Receipts stay small while still binding back to the exact reduced source artifact bytes.
The receipt is a score-boundary artifact. It does not mean the score is correct, the scorer is reliable, the Mastra runtime behaved correctly, the trace/span anchor is complete, or the score should pass or fail a gate.
The output bundle can be verified with:
The same bundle can feed the Trust Basis compiler:
P14c does not add a Trust Basis claim. The first Mastra compiler slice proves the receipt bundle is bundleable, verifiable, and readable by the Trust Basis path. P14d freezes the current compatibility decision: score receipts remain importer-only until any future score-receipt Trust Basis claim has an explicit claim boundary, Trust Card impact, and Harness posture.
Use --import-time <RFC3339> for deterministic fixture generation.
Options¶
| Option | Description |
|---|---|
--input <PATH> | Mastra reduced ScoreEvent JSONL artifact file |
--bundle-out <PATH> | Output Assay evidence bundle path |
--source-artifact-ref <REF> | Reviewer-safe source artifact reference stored in receipts |
--run-id <ID> | Assay import run id used for receipt provenance and event ids |
--import-time <RFC3339> | Deterministic import timestamp override |
LiveKit Tool Action Import¶
Import bounded LiveKit FunctionToolsExecutedEvent-derived artifacts into a verifiable Assay evidence bundle:
assay evidence import livekit-tool-action \
--input livekit-tool-action.json \
--bundle-out livekit-tool-action-receipts.tar.gz \
--source-artifact-ref livekit-tool-action.json
The importer is intentionally strict in v1:
- input may be one JSON object, a JSON array of objects, or JSONL rows using
livekit.function-tools-executed.export.v1 - each artifact must use
framework = livekit_agents - each artifact must use
surface = function_tools_executed - each artifact must use
runtime_mode = agent_session - optional
typemust befunction_tools_executedwhen present - one receipt is emitted per function call / output pair
- calls and outputs are paired by LiveKit SDK list order
- if every paired call/output entry has
call_id, mismatches fail the import as an audit consistency check - partial
call_idpresence is accepted and still uses list-order pairing - missing
FunctionCallOutput/nulloutput entries are preserved ascompleted=falsewithout inferringis_error - raw tool arguments and outputs are accepted only as fixture input for hashing; receipts store
arguments_hash/output_hashor explicit reviewer-safe refs - transcripts, audio, user input, model output, room state, participant identity, usage telemetry, latency telemetry, capture context, session identity, full traces, and spans are excluded
The importer first computes source_artifact_digest over the full reduced artifact file, then reduces each function tool action. Receipts stay small while still binding back to the exact source artifact bytes.
The receipt is an acted-boundary candidate artifact. It does not mean the tool call was correct, intended, allowed, safe, or representative of the full LiveKit session. It also does not claim LiveKit endorsement or a stable LiveKit wire contract.
The output bundle can be verified with:
The same bundle can feed the Trust Basis compiler:
assay trust-basis generate livekit-tool-action-receipts.tar.gz --out livekit-tool-action.trust-basis.json
P47 Stage 1 does not add a Trust Basis claim. LiveKit tool-action receipts remain importer-only until any future acted-family claim slice defines exact semantics, Trust Card impact, family-matrix posture, and compatibility rules.
Use --import-time <RFC3339> for deterministic fixture generation.
Options¶
| Option | Description |
|---|---|
--input <PATH> | LiveKit reduced function-tool execution artifact file |
--bundle-out <PATH> | Output Assay evidence bundle path |
--source-artifact-ref <REF> | Reviewer-safe source artifact reference stored in receipts |
--run-id <ID> | Assay import run id used for receipt provenance and event ids |
--import-time <RFC3339> | Deterministic import timestamp override |
OpenFeature Details Import¶
Import bounded OpenFeature boolean EvaluationDetails artifacts into a verifiable Assay evidence bundle:
assay evidence import openfeature-details \
--input openfeature-details.jsonl \
--bundle-out openfeature-decision-receipts.tar.gz \
--source-artifact-ref openfeature-details.jsonl
The importer is intentionally strict in v1:
- input must be JSONL with one bounded
EvaluationDetailsartifact per row - each row must use
openfeature.evaluation-details.export.v1 - each row must represent
target_kind = feature_flag result.valuemust be booleanresult.reasonis a bounded string, not an Assay-owned enum- provider config, evaluation context, targeting keys, rules, metadata,
error_message, and full provider state are excluded
The importer first computes source_artifact_digest over the full JSONL file, then parses and reduces decision details. Receipts stay small while still binding back to the exact source artifact bytes.
The receipt is a decision-boundary artifact. It does not mean the flag decision was correct, the application behavior was safe, the provider was correct, or the targeting rules were imported as Assay truth.
The output bundle can be verified with:
The same bundle can feed the Trust Basis compiler:
Trust Basis emits external_decision_receipt_boundary_visible when the supported OpenFeature boolean decision receipt shape is present. That claim means the bounded decision receipt boundary is visible; it does not mean the flag decision was correct, the provider was correct, the targeting rules are correct, or application behavior is safe.
Use --import-time <RFC3339> for deterministic fixture generation.
To compare the resulting Trust Basis artifact against another run, use assay trust-basis diff.
Options¶
| Option | Description |
|---|---|
--input <PATH> | OpenFeature EvaluationDetails JSONL artifact file |
--bundle-out <PATH> | Output Assay evidence bundle path |
--source-artifact-ref <REF> | Reviewer-safe source artifact reference stored in receipts |
--run-id <ID> | Assay import run id used for receipt provenance and event ids |
--import-time <RFC3339> | Deterministic import timestamp override |
Pydantic Case-Result Import¶
Import bounded Pydantic Evals reduced case-result artifacts into a verifiable Assay evidence bundle:
assay evidence import pydantic-case-result \
--input pydantic-case-results.jsonl \
--bundle-out pydantic-case-result-receipts.tar.gz \
--source-artifact-ref pydantic-case-results.jsonl
The importer is intentionally strict in v1:
- input must be JSONL with one reduced case-result artifact per row
- each row must use
pydantic-evals.report-case-result.export.v1 - each row must use
framework = pydantic_evals - each row must use
surface = evaluation_report.cases.case_result case_nameis the only docs-backed v1 case identitycase_id_refis not supported in P9dsource_case_nameandsource_refare allowed only as non-identity provenance aidsresults[]may contain bounded assertion pass/fail entries and scalar score entries only- raw
ReportCase, fullEvaluationReport, task inputs, expected outputs, model outputs, report metadata, experiment metadata, trace/span references, Logfire payloads, prompts, completions, analyses, failures, and evaluator implementation/config bodies are excluded
The importer first computes source_artifact_digest over the full JSONL file, then parses and reduces case-result artifacts. Receipts stay small while still binding back to the exact reduced source artifact bytes.
The receipt is a case-result-boundary artifact. It does not mean the evaluator judgment is correct, the model output was correct, the full ReportCase or EvaluationReport was imported, or Logfire/trace semantics are Assay truth.
The output bundle can be verified with:
The same bundle can feed the Trust Basis compiler:
assay trust-basis generate pydantic-case-result-receipts.tar.gz --out pydantic-case-result.trust-basis.json
P9d does not add a Trust Basis claim. Pydantic case-result receipts remain importer-only until any future claim slice defines exact semantics, Trust Card impact, and Harness posture.
Use --import-time <RFC3339> for deterministic fixture generation.
Options¶
| Option | Description |
|---|---|
--input <PATH> | Pydantic Evals reduced case-result JSONL artifact file |
--bundle-out <PATH> | Output Assay evidence bundle path |
--source-artifact-ref <REF> | Reviewer-safe source artifact reference stored in receipts |
--run-id <ID> | Assay import run id used for receipt provenance and event ids |
--import-time <RFC3339> | Deterministic import timestamp override |
Promptfoo JSONL Import¶
Import Promptfoo CLI JSONL assertion component results into a verifiable Assay evidence bundle:
assay evidence import promptfoo-jsonl \
--input results.jsonl \
--bundle-out promptfoo-evidence.tar.gz \
--source-artifact-ref results.jsonl
The importer is intentionally strict in v1:
- input must be Promptfoo CLI JSONL rows
- each row must carry
gradingResult.componentResults[] - each component must be an
equalsassertion result - component scores must be binary (
0or1) - raw prompt, output, expected value, vars, and full JSONL rows are excluded
The importer first computes source_artifact_digest over the full JSONL file, then parses and reduces assertion components. That two-pass flow is intentional: receipts stay small while still binding back to the exact source artifact bytes.
result.reason is optional and bounded. For v1, failure reasons are omitted when they would leak raw compared values. Passing reasons are included only when they remain short and reviewer-safe.
The output bundle can be verified with:
The same bundle can feed the Trust Basis compiler:
This proves the imported receipts are bundleable, verifiable, and readable by the Trust Basis path. Trust Basis now emits external_eval_receipt_boundary_visible when the supported Promptfoo receipt shape is present. That claim means the bounded receipt boundary is visible; it does not mean the Promptfoo eval run passed, the model output was correct, or the raw Promptfoo payload is imported as Assay truth.
Use --import-time <RFC3339> for deterministic fixture generation.
To compare the resulting Trust Basis artifact against another run, use assay trust-basis diff.
Options¶
| Option | Description |
|---|---|
--input <PATH> | Promptfoo CLI JSONL output file |
--bundle-out <PATH> | Output Assay evidence bundle path |
--source-artifact-ref <REF> | Reviewer-safe source artifact reference stored in receipts |
--run-id <ID> | Assay import run id used for receipt provenance and event ids |
--import-time <RFC3339> | Deterministic import timestamp override |
See Also¶
- Evidence Contract v1
- Receipt family matrix
- Receipt schema registry
- Trust Basis CLI
- CycloneDX ML-BOM Model Component evidence example
- Mastra ScoreEvent evidence example
- OpenFeature EvaluationDetails evidence example
- Promptfoo assertion grading-result example
- From Promptfoo JSONL to Evidence Receipts
- P43 CycloneDX ML-BOM model component receipt import plan
- P14c Mastra ScoreEvent receipt import plan
- P14d Mastra score receipt Trust Basis readiness freeze
- P45b OpenFeature decision receipt Trust Basis claim plan
- P41 OpenFeature decision receipt import plan
- P31 Promptfoo receipt import plan