Architecture¶

Assay is a CI-native evidence and trust compiler for agent systems, built as a Rust workspace.

Structure¶

Crate Structure — workspace organization and module layout
Data Flow — trace → gate → evidence pipeline
Split Refactor Plan (Q1-Q2 2026) — wave-by-wave execution plan
Split Refactor Report (Q1 2026) — verified closure and LOC outcomes
Split / Refactor Hotspot Inventory (Q2 2026) — current Rust hotspot baseline and next-wave ordering
ADR-032 Implementation Overview (Q2 2026) — current MCP policy stack on main
ADR-032 Building Block View (Q2 2026) — structural decomposition of the MCP policy stack
ADR-032 Quality Scenarios (Q2 2026) — explicit quality attributes and review scenarios
ADR-032 Structurizr Workspace (Q2 2026) — bounded architecture-as-code workspace and C4 model
ADR-032 Obsidian View Layer Recommendations (Q2 2026) — recommended internal view-layer setup
ADR-032 Documentation Maturity Gap Analysis (Q2 2026) — current-state gap analysis and follow-up posture
ADR-032 Execution Plan (Q2 2026) — MCP policy/obligation rollout status
ADR-033 Trust Compiler Positioning (Q2 2026) — product north star for Assay as an OTel-native trust compiler
RFC-005 Trust Compiler MVP (Q2 2026) — bounded plan for T1a compiler and T1b Trust Card
Release Plan — Trust Compiler 3.6 Evidence Portability — release-prep checklist for the first external-eval receipt lane
Release Plan — Trust Compiler 3.7 Evidence Portability — release record for the first three-family receipt boundary line
PLAN — T1a Trust Basis Compiler MVP (Q2 2026) — first execution wave for canonical trust-basis.json
Trust Compiler Audit Matrix (2026-03-26) — wave-by-wave audit of the trust-compiler line from T1b through K2-A Phase 1
Discovery — Next Evidence Wave (Q2 2026) — historical discovery note that ranked post-P2c candidates and led to K1
PLAN — K1 A2A Handoff / Delegation-Route Evidence (Q2 2026) — formal next-wave plan after P2c, adapter-first and evidence-first
K1-A Phase 1 Freeze (Q2 2026) — executable freeze for the first bounded typed handoff seam in A2A canonical adapter output
PLAN — K2 MCP Authorization-Discovery Evidence (Q2 2026) — active bounded MCP authorization-discovery wave, focused on visibility before any auth-discovery pack
K2-A Phase 1 Freeze (Q2 2026) — active contract for the first bounded MCP authorization-discovery seam now public in v3.5.0
K2-A Phase 1 Freeze Prep (Q2 2026) — pre-freeze source inventory and guardrails for the first bounded MCP authorization-discovery seam
PLAN — P9b Pydantic Reduced Case-Result Evidence Recut (Q2 2026) — implemented Pydantic sample recut around one reduced case-result artifact derived from EvaluationReport.cases[]; possible importer-only support remains future, not a Trust Basis claim or public receipt family
PLAN — P9c Pydantic Reduced Case-Result Receipt Readiness Freeze (Q2 2026) — readiness freeze before any Pydantic importer-only work; keeps EvaluationReport.cases[] as discovery input, the reduced artifact as the possible import unit, and ReportCase out of the contract boundary
PLAN — P9d Pydantic Case-Result Receipt Import (Q2 2026) — importer-only compiler path for bounded Pydantic reduced case-result artifacts, not a Trust Basis claim, Trust Card row, Harness recipe, or public receipt family
PLAN — P11A Visa TAP Intent Verification Evidence Interop (Q2 2026) — planned frontier commerce / trust-proof lane built around TAP verification-result evidence, not payment truth
TODO — Next Upstream Interop Lanes (Q2 2026) — ranked post-Agno queue that now tracks Langfuse as the current platform-adjacent lane and APS as a promote-only P11D watchlist under the commerce / trust-proof family
PLAN — P12 Browser Use History / Output Evidence Interop (Q2 2026) — planned adjacent-space lane built around Browser Use local run history and output, not observability export
PLAN — P13 Langfuse Experiment Result Evidence Interop (Q2 2026) — planned platform-adjacent lane built around bounded experiment item results and evaluations, not Langfuse trace export
PLAN — P14 Mastra Scorer / Experiment-Result Evidence Interop (Q2 2026) — planned scorer-first Mastra lane built around bounded experiment-item evidence, not tracing or Studio exports
PLAN — P14b Mastra ScoreEvent / ExportedScore Evidence Interop (Q2 2026) — maintainer-guided Mastra recut around ObservabilityExporter + ScoreEvent + ExportedScore, now live-captured on the newer Mastra line with scoreId
PLAN — P14c Mastra ScoreEvent Receipt Import (Q2 2026) — implemented Assay-side compiler path for bounded Mastra score-event artifacts into portable importer-only receipts, not Mastra observability, scorer, trace, or runtime truth
PLAN — P14d Mastra Score Receipt Trust Basis Readiness Freeze (Q2 2026) — semantic freeze that keeps Mastra score receipts importer-only until any score-derived Trust Basis claim boundary is explicitly accepted
PLAN — P15 x402 Requirement / Verification Evidence Interop (Q2 2026) — planned requirement-and-verification-first x402 lane built around PaymentRequired plus VerifyResponse, not settlement or fulfillment truth
PLAN — P16 LiveKit Agents Testing-Result / RunEvent Evidence Interop (Q2 2026) — planned testing-result-first LiveKit lane built around voice.testing.RunResult.events, not telemetry or transcript export
PLAN — P17 LlamaIndex EvaluationResult Evidence Interop (Q2 2026) — planned eval-result-first LlamaIndex lane built around bounded EvaluationResult evidence, not traces or callback exports
PLAN — P18 Vercel AI SDK UIMessage Evidence Interop (Q2 2026) — planned message-first Vercel AI SDK lane built around bounded UIMessage artifacts, with show-and-tell-first outward strategy rather than question-first
PLAN — P19 Mem0 Add Memories Result Evidence Interop (Q2 2026) — planned mutation-result-first Mem0 lane built around bounded Add Memories results, not retrieval or profile truth
PLAN — P20 AG-UI Compacted Message Snapshot Artifact Evidence Interop (Q2 2026) — planned compacted-message-history AG-UI lane built around one bounded run envelope and one MESSAGES_SNAPSHOT, not general serialization or full stream fidelity
PLAN — P21 Stagehand Observe-Derived Selector-Scoped Extract Artifact Evidence Interop (Q2 2026) — planned selector-scoped Stagehand lane built around one observe-derived selector anchor plus one scoped extract result, not broad browser-agent support or snapshot truth
PLAN — P22 OpenAI Agents JS Tool Approval Interruption / Resumable-State Evidence Interop (Q2 2026) — planned paused-run OpenAI Agents JS lane built around bounded interruptions plus one resumable continuation anchor, not transcript, session, or provider-chaining truth
PLAN — P23B Assay Paused Human-in-the-Loop Evidence Pattern (Q2 2026) — planned Assay-side reference pattern for bounded paused HITL evidence, standardizing pause_reason, interruptions, call_id_ref, and derived resume_state_ref without importing transcript, session, or full serialized-state truth
PLAN — P24 Phoenix Span Annotation Evaluation-Signal Evidence Interop (Q2 2026) — planned annotation-first Phoenix lane built around one bounded span annotation artifact, not trace, experiment, evaluator, or platform truth
PLAN — P25 LangWatch Custom Span Evaluation Signal Evidence Interop (Q2 2026) — planned custom-evaluation-first LangWatch lane built around one bounded span-linked evaluation signal, not trace, dataset, evaluation-session, or platform truth
PLAN — P26 AgentEvals Trajectory Strict-Match Result Signal Evidence (Q2 2026) — planned strict-match-first AgentEvals lane built around one returned deterministic trajectory match result, not LangSmith runs, LLM-as-judge outputs, or raw trajectory truth
PLAN — P27 AutoEvals ExactMatch Score Evidence (Q2 2026) — planned ExactMatch-first AutoEvals lane built around one returned deterministic output/expected comparison score, not Braintrust runs, JSON/list scorer bundles, LLM judge outputs, or raw payload truth
PLAN — P28 Promptfoo Assertion GradingResult Evidence (Q2 2026) — planned deterministic-assertion-first Promptfoo lane built around one surfaced GradingResult, not full eval exports, prompt matrices, red-team reports, or raw provider output truth
PLAN — P29 Guardrails Validation Outcome Evidence (Q2 2026) — planned outcome-first Guardrails AI lane built around one bounded validation outcome, not prompt, corrected-output, reask, or guard-history truth
PLAN — P30 OpenFeature EvaluationDetails Evidence (Q2 2026) — planned governance-adjacent OpenFeature lane built around one returned EvaluationDetails object, not provider config, targeting, rollout, telemetry, or application correctness truth
PLAN — P31 Promptfoo JSONL Component Result Receipt Import (Q2 2026) — planned compiler-path follow-up to P28 that imports one Promptfoo JSONL assertion component result into one portable Assay evidence receipt, not full Promptfoo eval-run truth or Harness regression gating
PLAN — P32 Promptfoo Receipt Trust Basis Readiness (Q2 2026) — execution slice that proves P31 receipt bundles feed the current Trust Basis compiler without adding a Promptfoo-specific claim row or Trust Card schema bump
PLAN — P33 External Eval Receipt Trust Basis Claim (Q2 2026) — execution slice that adds one bounded Trust Basis claim for supported external evaluation receipt boundaries, starting with Promptfoo assertion-component receipts
PLAN — P34 Trust Basis Diff Gate (Q2 2026) — execution slice that compares canonical Trust Basis artifacts for claim-level regressions without parsing Promptfoo JSONL or external eval payloads
PLAN — P41 OpenFeature EvaluationDetails Decision Receipt Import (Q2 2026) — execution slice that imports bounded boolean OpenFeature decision details as portable Assay receipts, not provider config, targeting, metadata, or application correctness truth
PLAN — P43 CycloneDX ML-BOM Model Component Receipt Import (Q2 2026) — execution slice that imports one selected CycloneDX machine-learning-model component as a portable inventory receipt, not full BOM, model-card, dataset, graph, or compliance truth
PLAN — P45 Inventory Receipt Trust Basis Claim (Q2 2026) — execution slice that adds one bounded Trust Basis claim for supported inventory receipt boundaries, starting with CycloneDX ML-BOM model-component receipts
PLAN — P45b Decision Receipt Trust Basis Claim (Q2 2026) — execution slice that adds one bounded Trust Basis claim for supported decision receipt boundaries, starting with OpenFeature boolean EvaluationDetails receipts
PLAN — P52-P56 Assay Product Surface Consolidation Program (Q2 2026) — implemented post-v3.8.0 consolidation program for product truth sync, Trust Basis assertions, receipt schema CLI, static Trust Card HTML, and policy/tool digest binding
PLAN — P56a Policy Snapshot Digest Visibility (Q2 2026) — implemented slice that projects canonical policy snapshot digest metadata onto supported MCP decision evidence without claiming policy correctness
PLAN — P56b Tool Definition Digest Binding (Q2 2026) — implemented companion slice that binds supported MCP decision evidence to a bounded tool-definition digest without claiming tool safety, signature validity, or registry truth
Assay Architecture & Roadmap Gap Analysis (Q2 2026) — repo-wide truth sync and next-step ordering

Active RFCs¶

RFC	Status	Summary
RFC-001: DX/UX & Governance	Historical (Wave A/B delivered; Wave C remains data-gated)	Normative DX/refactor invariants and historical execution framing
RFC-002: Code Health Remediation	Complete (E1–E4 merged, E5→RFC-003)	Store, metrics, registry, comment cleanup
RFC-003: Generate Decomposition	Complete (G1–G6 merged)	`generate.rs` split into focused modules
RFC-004: Open Items Convergence	Closed (O1–O6 merged on `main`)	Historical closure ledger for the Q1 convergence line
RFC-005: Trust Compiler MVP	Active (`T1a`..`H1` public in `v3.3.0`; `G4-A`, `P2c`, and `K1-A` public in `v3.4.0`; `K2-A` Phase 1 is now public in `v3.5.0`)	Bounded plan for the trust-compiler and Trust Card line

Architecture Decision Records¶

See the full ADR index for all accepted and proposed architecture decisions.

Key ADRs: - ADR-003: Gate Semantics — Pass/Fail/Warn/Flaky - ADR-006: Evidence Contract — schema v1 - ADR-014: GitHub Action v2 — CI integration - ADR-015: BYOS Strategy — bring your own storage - ADR-032: MCP Policy Enforcement v2 — typed decisions + obligations + evidence - ADR-033: Trust Compiler Positioning — claims-as-code north star and Trust Card wedge

Reference¶

Code Analysis Report — finding snapshot (remediation tracked in RFCs)
Assay Architecture & Roadmap Gap Analysis — repo-wide truth sync across architecture and roadmap
Pipeline Decomposition Plan — run/ci shared pipeline design