What Assay Is and Is Not¶

Assay compiles agent runtime signals and selected external outcomes into verifiable evidence and bounded Trust Basis claims.

It is strongest when a team needs deterministic governance over tool calls, portable evidence bundles, and reviewable trust artifacts in CI. It is not an eval runner, observability dashboard, compliance oracle, or general-purpose authorization service.

Core Boundary¶

Assay owns this chain:

runtime/import signal
  -> canonical evidence bundle
  -> bundle verification
  -> Trust Basis claims
  -> Trust Card / SARIF / CI projections

Policy enforcement is still a key wedge. Assay can sit between an agent and MCP tools, evaluate explicit policy, and record the decision. The broader product surface is the evidence compiler around those decisions: what happened, what was verified, what was merely visible, and what should not be claimed.

In Scope¶

Area	What Assay Does
Protocol policy	Deterministic allow/deny/approval decisions over supported MCP tool-call surfaces.
Evidence bundles	Offline-verifiable evidence artifacts with canonical event envelopes and content binding.
Trust Basis	Bounded claim classification from verified bundles, keyed by stable `claim.id`.
Trust Card	Canonical JSON plus Markdown/HTML projections of the Trust Basis claim set.
External receipts	Narrow compiler lanes for selected upstream seams such as Promptfoo assertion components, OpenFeature boolean `EvaluationDetails`, and CycloneDX ML-BOM model components.
CI projections	SARIF/JUnit/Markdown outputs where appropriate, with raw canonical artifacts kept separate.
Packs	Optional evidence linting and policy packs that structure findings; packs do not prove legal compliance by themselves.

The machine-readable receipt family surface is tracked in the receipt family matrix.

Out of Scope¶

Area	Why It Is Not Assay
Eval running	Promptfoo, DeepEval, Braintrust, LangSmith, Langfuse, Phoenix, and similar tools should run or manage evaluations. Assay imports selected outcomes as bounded receipts when useful.
Observability dashboard	Assay can export or bridge evidence, but it does not replace tracing, metrics, prompt management, or production monitoring platforms.
Trust score	Trust Basis claims use explicit evidence levels. Assay does not collapse trust into a single score, badge, or "safe/unsafe" label.
Compliance certification	Assay can produce evidence and pack findings. It does not certify EU AI Act, SOC 2, or other legal compliance.
Full BOM viewer	CycloneDX ML-BOM receipts preserve selected inventory boundaries. Assay does not import full BOM graphs, vulnerabilities, licenses, or model-card truth.
Semantic safety classifier	Toxicity, jailbreak, hallucination, bias, and content-safety checks require probabilistic or model-based systems. Assay should complement them, not impersonate them.

Assay Versus Assay Harness¶

Assay owns artifact semantics:

evidence import and reduction,
evidence bundle verification,
receipt schemas and receipt-family matrix,
Trust Basis generation and diff semantics,
Trust Card generation.

Assay Harness owns operational CI recipes above those artifacts:

run baseline/candidate recipes,
preserve raw Assay diff JSON,
map Trust Basis regressions to CI exits,
project raw diffs into Markdown or JUnit.

Harness must not parse Promptfoo JSONL, OpenFeature JSONL, CycloneDX BOMs, or Assay receipt payloads. Domain semantics stay in Assay.

Decision Rule¶

Use Assay when the answer should come from deterministic policy, verified evidence, or a bounded receipt boundary.

Use another tool first when the answer requires subjective scoring, semantic judgment, broad trace exploration, prompt iteration, or legal certification.

Assay should make those external results portable when they matter; it should not become those systems.