Skip to content

assay evidence

Manage Assay evidence bundles and external evidence imports.


Synopsis

assay evidence <COMMAND> [OPTIONS]

Receipt Schema Registry

Inspect and validate the machine-readable receipt schema registry:

assay evidence schema list
assay evidence schema show promptfoo.assertion-component.v1
assay evidence schema show promptfoo.assertion-component.v1 --raw
assay evidence schema validate \
  --schema promptfoo.assertion-component.v1 \
  --input receipt.json

For JSONL importer inputs, validate each non-empty row with --jsonl:

assay evidence schema validate \
  --schema promptfoo-cli-jsonl-component-result.v1 \
  --input results.jsonl \
  --jsonl

The schema CLI covers the current receipt schema registry:

  • receipt payload schemas for Promptfoo, OpenFeature, CycloneDX ML-BOM, Mastra, Pydantic, and LiveKit receipts
  • importer input schemas where the reduced input artifact differs from the receipt payload
  • metadata such as schema $id, family, status, source path, short description, and Trust Basis claim when one exists

Mastra, Pydantic, and LiveKit remain importer-only in this registry. They have input and receipt schemas, but no public Trust Basis score, case-result, or acted-family receipt claims yet.

validate exits 0 when the artifact matches the selected schema and exits 1 when the artifact is valid JSON/JSONL but fails schema validation. Invalid JSON, invalid JSONL rows, empty JSONL input, unknown schema names, unreadable files, and runtime/configuration errors remain input/config errors (2+).

Options

Command Description
assay evidence schema list [--format text|json] List all supported schema entries
assay evidence schema show <SCHEMA> [--format text|json] [--raw] Show schema metadata or raw JSON Schema
assay evidence schema validate --schema <SCHEMA> --input <PATH> [--jsonl] [--format text|json] Validate a JSON or JSONL artifact

Schema names can be the registry name, known alias, source path, or JSON Schema $id. Use list to discover supported names.


MCP Execution Record Pairing

Verify that request binding material and server execution-record fixtures pair up from the consumer side:

assay evidence verify-mcp-records \
  --attestation sep2787-attestation.json \
  --decision server-decision-record.json \
  --outcome server-outcome-record.json \
  --format json

For deployments without SEP-2787 attestation, supply the observed tools/call params plus _meta request envelope instead:

assay evidence verify-mcp-records \
  --request-envelope tools-call-envelope.json \
  --decision server-decision-record.json \
  --outcome server-outcome-record.json \
  --format json

--attestation and --request-envelope are mutually exclusive. Exactly one is required.

This command emits an assay.mcp.execution-record-pairing.report.v0 report. It computes the binding digest, checks the decision and optional outcome backLink fields, verifies the outcome's decisionDigest commitment to the full signed decision record, and verifies the narrow decision/outcome enum surface. In SEP-2787 mode the binding digest is the attestation JCS digest and the nonce comes from issuerAsserted.nonce. In request-envelope mode the binding digest is the JCS digest of the supplied envelope, while the nonce is only checked for decision/outcome record consistency.

The command is deliberately not an MCP proxy, issuer, policy engine, or runtime truth oracle. It does not verify signatures, establish issuer key trust, prove policy correctness, prove side effects, or disclose payload/result bodies. It is for downstream verifier fixtures and reviewer-visible pairing diagnostics. Request-envelope fallback does not prove the server observed that envelope honestly, or that the server-chosen nonce was unique or fresh for the call.

If --outcome is omitted, Assay reports a valid decision-only pairing check. Pairing or enum mismatches produce a report and exit 2.


MCP Tunnel Observed-Facts Check

Validate one bounded MCP tunnel observed-facts fixture and classify its join evidence from the consumer side:

assay evidence verify-mcp-tunnel-observed \
  --artifact examples/mcp-tunnel-observed-evidence/fixtures/valid.tunnel.json \
  --format json

This command emits an assay.mcp.tunnel-observed.report.v0 report. It checks the provider-neutral assay.mcp.tunnel_observed.v0 fixture shape, enforces the no-raw-payload and no-raw-authorization boundaries, and reports whether evidence_refs form a strong same_request_instance join or only diagnostic correlation.

A strong join requires the referenced evidence to bind the same request_envelope_digest and request_envelope_canonicalization as the tunnel artifact. Route, upstream, request id, timestamp, or provider request id alone remain diagnostic correlation. The command deliberately does not prove tunnel mediation, agent identity, authorization success, policy correctness, tool result truth, application outcome truth, or issuer/key trust. Boundary violations produce a JSON/table report and exit 2.


CycloneDX ML-BOM Model Import

Import one selected CycloneDX ML-BOM machine-learning-model component into a verifiable Assay evidence bundle:

assay evidence import cyclonedx-mlbom-model \
  --input bom.cdx.json \
  --bundle-out cyclonedx-model-receipt.tar.gz \
  --source-artifact-ref bom.cdx.json

The importer is intentionally strict in v1:

  • input must be CycloneDX JSON with bomFormat = CycloneDX
  • model components must live in components[]
  • the selected component must have type = machine-learning-model
  • the selected component must have bounded bom-ref and name
  • if multiple model components exist, --bom-ref is required
  • full BOM graphs, modelCard bodies, dataset bodies, vulnerabilities, licenses, pedigree, metrics, and fairness/ethics sections are excluded

The importer first computes source_artifact_digest over the full BOM file, then reduces the selected model component. Receipts stay small while still binding back to the exact source artifact bytes.

The receipt is an inventory-boundary artifact. It does not mean the model is safe, approved, licensed, compliant, vulnerable or non-vulnerable, fair, or correct. It also does not import full CycloneDX BOM truth into Assay.

The output bundle can be verified with:

assay evidence verify cyclonedx-model-receipt.tar.gz

The same bundle can feed the Trust Basis compiler:

assay trust-basis generate cyclonedx-model-receipt.tar.gz --out cyclonedx-model.trust-basis.json

Trust Basis emits external_inventory_receipt_boundary_visible when the supported CycloneDX ML-BOM model-component receipt shape is present. That claim means the bounded inventory receipt boundary is visible; it does not mean the BOM is complete, the model is safe, the model card is correct, the datasets are approved, or the CycloneDX artifact is imported as Assay truth.

Use --bom-ref <REF> when the BOM has multiple machine-learning-model components. Use --import-time <RFC3339> for deterministic fixture generation.

Options

Option Description
--input <PATH> CycloneDX JSON BOM artifact file
--bundle-out <PATH> Output Assay evidence bundle path
--bom-ref <REF> Select a machine-learning-model component by bom-ref
--source-artifact-ref <REF> Reviewer-safe source artifact reference stored in receipts
--run-id <ID> Assay import run id used for receipt provenance and event ids
--import-time <RFC3339> Deterministic import timestamp override

Mastra ScoreEvent Import

Import bounded, reviewer-safe Mastra ScoreEvent / ExportedScore-derived score artifacts into a verifiable Assay evidence bundle:

assay evidence import mastra-score-event \
  --input mastra-score-events.jsonl \
  --bundle-out mastra-score-receipts.tar.gz \
  --source-artifact-ref mastra-score-events.jsonl

The importer is intentionally strict in v1:

  • input must be JSONL with one reduced score-event artifact per row
  • each row must use mastra.score-event.export.v1
  • each row must use surface = observability.score_event
  • score must be numeric
  • target_ref and at least one scorer identity (scorer_id or scorer_name) must be present
  • score_id_ref maps to Mastra scoreId when present; it is live-backed on @mastra/core 1.29.1 / @mastra/observability 1.10.2, but remains optional in the v1 reduced artifact for older captures and compatibility fixtures
  • score_source, trace_id_ref, span_id_ref, and score_trace_id_ref are reviewer aids only, not receipt identity in v1
  • raw exporter callback payloads, raw metadata, raw correlationContext, trace trees, spans, logs, metrics, feedback, prompts, request/response bodies, scorer configs, and dashboard state are excluded

The importer first computes source_artifact_digest over the full JSONL file, then parses and reduces score-event artifacts. Receipts stay small while still binding back to the exact reduced source artifact bytes.

The receipt is a score-boundary artifact. It does not mean the score is correct, the scorer is reliable, the Mastra runtime behaved correctly, the trace/span anchor is complete, or the score should pass or fail a gate.

The output bundle can be verified with:

assay evidence verify mastra-score-receipts.tar.gz

The same bundle can feed the Trust Basis compiler:

assay trust-basis generate mastra-score-receipts.tar.gz --out mastra-score.trust-basis.json

P14c does not add a Trust Basis claim. The first Mastra compiler slice proves the receipt bundle is bundleable, verifiable, and readable by the Trust Basis path. P14d freezes the current compatibility decision: score receipts remain importer-only until any future score-receipt Trust Basis claim has an explicit claim boundary, Trust Card impact, and Harness posture.

Use --import-time <RFC3339> for deterministic fixture generation.

Options

Option Description
--input <PATH> Mastra reduced ScoreEvent JSONL artifact file
--bundle-out <PATH> Output Assay evidence bundle path
--source-artifact-ref <REF> Reviewer-safe source artifact reference stored in receipts
--run-id <ID> Assay import run id used for receipt provenance and event ids
--import-time <RFC3339> Deterministic import timestamp override

LiveKit Tool Action Import

Import bounded LiveKit FunctionToolsExecutedEvent-derived artifacts into a verifiable Assay evidence bundle:

assay evidence import livekit-tool-action \
  --input livekit-tool-action.json \
  --bundle-out livekit-tool-action-receipts.tar.gz \
  --source-artifact-ref livekit-tool-action.json

The importer is intentionally strict in v1:

  • input may be one JSON object, a JSON array of objects, or JSONL rows using livekit.function-tools-executed.export.v1
  • each artifact must use framework = livekit_agents
  • each artifact must use surface = function_tools_executed
  • each artifact must use runtime_mode = agent_session
  • optional type must be function_tools_executed when present
  • one receipt is emitted per function call / output pair
  • calls and outputs are paired by LiveKit SDK list order
  • if every paired call/output entry has call_id, mismatches fail the import as an audit consistency check
  • partial call_id presence is accepted and still uses list-order pairing
  • missing FunctionCallOutput / null output entries are preserved as completed=false without inferring is_error
  • raw tool arguments and outputs are accepted only as fixture input for hashing; receipts store arguments_hash / output_hash or explicit reviewer-safe refs
  • transcripts, audio, user input, model output, room state, participant identity, usage telemetry, latency telemetry, capture context, session identity, full traces, and spans are excluded

The importer first computes source_artifact_digest over the full reduced artifact file, then reduces each function tool action. Receipts stay small while still binding back to the exact source artifact bytes.

The receipt is an acted-boundary candidate artifact. It does not mean the tool call was correct, intended, allowed, safe, or representative of the full LiveKit session. It also does not claim LiveKit endorsement or a stable LiveKit wire contract.

The output bundle can be verified with:

assay evidence verify livekit-tool-action-receipts.tar.gz

The same bundle can feed the Trust Basis compiler:

assay trust-basis generate livekit-tool-action-receipts.tar.gz --out livekit-tool-action.trust-basis.json

P47 Stage 1 does not add a Trust Basis claim. LiveKit tool-action receipts remain importer-only until any future acted-family claim slice defines exact semantics, Trust Card impact, family-matrix posture, and compatibility rules.

Use --import-time <RFC3339> for deterministic fixture generation.

Options

Option Description
--input <PATH> LiveKit reduced function-tool execution artifact file
--bundle-out <PATH> Output Assay evidence bundle path
--source-artifact-ref <REF> Reviewer-safe source artifact reference stored in receipts
--run-id <ID> Assay import run id used for receipt provenance and event ids
--import-time <RFC3339> Deterministic import timestamp override

OpenFeature Details Import

Import bounded OpenFeature boolean EvaluationDetails artifacts into a verifiable Assay evidence bundle:

assay evidence import openfeature-details \
  --input openfeature-details.jsonl \
  --bundle-out openfeature-decision-receipts.tar.gz \
  --source-artifact-ref openfeature-details.jsonl

The importer is intentionally strict in v1:

  • input must be JSONL with one bounded EvaluationDetails artifact per row
  • each row must use openfeature.evaluation-details.export.v1
  • each row must represent target_kind = feature_flag
  • result.value must be boolean
  • result.reason is a bounded string, not an Assay-owned enum
  • provider config, evaluation context, targeting keys, rules, metadata, error_message, and full provider state are excluded

The importer first computes source_artifact_digest over the full JSONL file, then parses and reduces decision details. Receipts stay small while still binding back to the exact source artifact bytes.

The receipt is a decision-boundary artifact. It does not mean the flag decision was correct, the application behavior was safe, the provider was correct, or the targeting rules were imported as Assay truth.

The output bundle can be verified with:

assay evidence verify openfeature-decision-receipts.tar.gz

The same bundle can feed the Trust Basis compiler:

assay trust-basis generate openfeature-decision-receipts.tar.gz --out openfeature.trust-basis.json

Trust Basis emits external_decision_receipt_boundary_visible when the supported OpenFeature boolean decision receipt shape is present. That claim means the bounded decision receipt boundary is visible; it does not mean the flag decision was correct, the provider was correct, the targeting rules are correct, or application behavior is safe.

Use --import-time <RFC3339> for deterministic fixture generation.

To compare the resulting Trust Basis artifact against another run, use assay trust-basis diff.

Options

Option Description
--input <PATH> OpenFeature EvaluationDetails JSONL artifact file
--bundle-out <PATH> Output Assay evidence bundle path
--source-artifact-ref <REF> Reviewer-safe source artifact reference stored in receipts
--run-id <ID> Assay import run id used for receipt provenance and event ids
--import-time <RFC3339> Deterministic import timestamp override

Pydantic Case-Result Import

Import bounded Pydantic Evals reduced case-result artifacts into a verifiable Assay evidence bundle:

assay evidence import pydantic-case-result \
  --input pydantic-case-results.jsonl \
  --bundle-out pydantic-case-result-receipts.tar.gz \
  --source-artifact-ref pydantic-case-results.jsonl

The importer is intentionally strict in v1:

  • input must be JSONL with one reduced case-result artifact per row
  • each row must use pydantic-evals.report-case-result.export.v1
  • each row must use framework = pydantic_evals
  • each row must use surface = evaluation_report.cases.case_result
  • case_name is the only docs-backed v1 case identity
  • case_id_ref is not supported in P9d
  • source_case_name and source_ref are allowed only as non-identity provenance aids
  • results[] may contain bounded assertion pass/fail entries and scalar score entries only
  • raw ReportCase, full EvaluationReport, task inputs, expected outputs, model outputs, report metadata, experiment metadata, trace/span references, Logfire payloads, prompts, completions, analyses, failures, and evaluator implementation/config bodies are excluded

The importer first computes source_artifact_digest over the full JSONL file, then parses and reduces case-result artifacts. Receipts stay small while still binding back to the exact reduced source artifact bytes.

The receipt is a case-result-boundary artifact. It does not mean the evaluator judgment is correct, the model output was correct, the full ReportCase or EvaluationReport was imported, or Logfire/trace semantics are Assay truth.

The output bundle can be verified with:

assay evidence verify pydantic-case-result-receipts.tar.gz

The same bundle can feed the Trust Basis compiler:

assay trust-basis generate pydantic-case-result-receipts.tar.gz --out pydantic-case-result.trust-basis.json

P9d does not add a Trust Basis claim. Pydantic case-result receipts remain importer-only until any future claim slice defines exact semantics, Trust Card impact, and Harness posture.

Use --import-time <RFC3339> for deterministic fixture generation.

Options

Option Description
--input <PATH> Pydantic Evals reduced case-result JSONL artifact file
--bundle-out <PATH> Output Assay evidence bundle path
--source-artifact-ref <REF> Reviewer-safe source artifact reference stored in receipts
--run-id <ID> Assay import run id used for receipt provenance and event ids
--import-time <RFC3339> Deterministic import timestamp override

Promptfoo JSONL Import

Import Promptfoo CLI JSONL assertion component results into a verifiable Assay evidence bundle:

assay evidence import promptfoo-jsonl \
  --input results.jsonl \
  --bundle-out promptfoo-evidence.tar.gz \
  --source-artifact-ref results.jsonl

The importer is intentionally strict in v1:

  • input must be Promptfoo CLI JSONL rows
  • each row must carry gradingResult.componentResults[]
  • each component must be an equals assertion result
  • component scores must be binary (0 or 1)
  • raw prompt, output, expected value, vars, and full JSONL rows are excluded

The importer first computes source_artifact_digest over the full JSONL file, then parses and reduces assertion components. That two-pass flow is intentional: receipts stay small while still binding back to the exact source artifact bytes.

result.reason is optional and bounded. For v1, failure reasons are omitted when they would leak raw compared values. Passing reasons are included only when they remain short and reviewer-safe.

The output bundle can be verified with:

assay evidence verify promptfoo-evidence.tar.gz

The same bundle can feed the Trust Basis compiler:

assay trust-basis generate promptfoo-evidence.tar.gz --out promptfoo.trust-basis.json

This proves the imported receipts are bundleable, verifiable, and readable by the Trust Basis path. Trust Basis now emits external_eval_receipt_boundary_visible when the supported Promptfoo receipt shape is present. That claim means the bounded receipt boundary is visible; it does not mean the Promptfoo eval run passed, the model output was correct, or the raw Promptfoo payload is imported as Assay truth.

Use --import-time <RFC3339> for deterministic fixture generation.

To compare the resulting Trust Basis artifact against another run, use assay trust-basis diff.

Options

Option Description
--input <PATH> Promptfoo CLI JSONL output file
--bundle-out <PATH> Output Assay evidence bundle path
--source-artifact-ref <REF> Reviewer-safe source artifact reference stored in receipts
--run-id <ID> Assay import run id used for receipt provenance and event ids
--import-time <RFC3339> Deterministic import timestamp override

See Also