Architecture¶
Assay is a CI-native evidence and trust compiler for agent systems, built as a Rust workspace.
Structure¶
- Crate Structure — workspace organization and module layout
- Data Flow — trace → gate → evidence pipeline
- Split Refactor Plan (Q1-Q2 2026) — wave-by-wave execution plan
- Split Refactor Report (Q1 2026) — verified closure and LOC outcomes
- Split / Refactor Hotspot Inventory (Q2 2026) — current Rust hotspot baseline and next-wave ordering
- ADR-032 Implementation Overview (Q2 2026) — current MCP policy stack on
main - ADR-032 Building Block View (Q2 2026) — structural decomposition of the MCP policy stack
- ADR-032 Quality Scenarios (Q2 2026) — explicit quality attributes and review scenarios
- ADR-032 Structurizr Workspace (Q2 2026) — bounded architecture-as-code workspace and C4 model
- ADR-032 Obsidian View Layer Recommendations (Q2 2026) — recommended internal view-layer setup
- ADR-032 Documentation Maturity Gap Analysis (Q2 2026) — current-state gap analysis and follow-up posture
- ADR-032 Execution Plan (Q2 2026) — MCP policy/obligation rollout status
- ADR-033 Trust Compiler Positioning (Q2 2026) — product north star for Assay as an OTel-native trust compiler
- RFC-005 Trust Compiler MVP (Q2 2026) — bounded plan for
T1acompiler andT1bTrust Card - Release Plan — Trust Compiler 3.6 Evidence Portability — release-prep checklist for the first external-eval receipt lane
- Release Plan — Trust Compiler 3.7 Evidence Portability — release record for the first three-family receipt boundary line
- PLAN — T1a Trust Basis Compiler MVP (Q2 2026) — first execution wave for canonical
trust-basis.json - Trust Compiler Audit Matrix (2026-03-26) — wave-by-wave audit of the trust-compiler line from
T1bthroughK2-APhase 1 - Discovery — Next Evidence Wave (Q2 2026) — historical discovery note that ranked post-
P2ccandidates and led toK1 - PLAN — K1 A2A Handoff / Delegation-Route Evidence (Q2 2026) — formal next-wave plan after
P2c, adapter-first and evidence-first - K1-A Phase 1 Freeze (Q2 2026) — executable freeze for the first bounded typed
handoffseam in A2A canonical adapter output - PLAN — K2 MCP Authorization-Discovery Evidence (Q2 2026) — active bounded MCP authorization-discovery wave, focused on visibility before any auth-discovery pack
- K2-A Phase 1 Freeze (Q2 2026) — active contract for the first bounded MCP authorization-discovery seam now public in
v3.5.0 - K2-A Phase 1 Freeze Prep (Q2 2026) — pre-freeze source inventory and guardrails for the first bounded MCP authorization-discovery seam
- PLAN — P9b Pydantic Reduced Case-Result Evidence Recut (Q2 2026) — implemented Pydantic sample recut around one reduced case-result artifact derived from
EvaluationReport.cases[]; possible importer-only support remains future, not a Trust Basis claim or public receipt family - PLAN — P9c Pydantic Reduced Case-Result Receipt Readiness Freeze (Q2 2026) — readiness freeze before any Pydantic importer-only work; keeps
EvaluationReport.cases[]as discovery input, the reduced artifact as the possible import unit, andReportCaseout of the contract boundary - PLAN — P9d Pydantic Case-Result Receipt Import (Q2 2026) — importer-only compiler path for bounded Pydantic reduced case-result artifacts, not a Trust Basis claim, Trust Card row, Harness recipe, or public receipt family
- PLAN — P11A Visa TAP Intent Verification Evidence Interop (Q2 2026) — planned frontier commerce / trust-proof lane built around TAP verification-result evidence, not payment truth
- TODO — Next Upstream Interop Lanes (Q2 2026) — ranked post-Agno queue that now tracks Langfuse as the current platform-adjacent lane and APS as a promote-only
P11Dwatchlist under the commerce / trust-proof family - PLAN — P12 Browser Use History / Output Evidence Interop (Q2 2026) — planned adjacent-space lane built around Browser Use local run history and output, not observability export
- PLAN — P13 Langfuse Experiment Result Evidence Interop (Q2 2026) — planned platform-adjacent lane built around bounded experiment item results and evaluations, not Langfuse trace export
- PLAN — P14 Mastra Scorer / Experiment-Result Evidence Interop (Q2 2026) — planned scorer-first Mastra lane built around bounded experiment-item evidence, not tracing or Studio exports
- PLAN — P14b Mastra ScoreEvent / ExportedScore Evidence Interop (Q2 2026) — maintainer-guided Mastra recut around
ObservabilityExporter+ScoreEvent+ExportedScore, now live-captured on the newer Mastra line withscoreId - PLAN — P14c Mastra ScoreEvent Receipt Import (Q2 2026) — implemented Assay-side compiler path for bounded Mastra score-event artifacts into portable importer-only receipts, not Mastra observability, scorer, trace, or runtime truth
- PLAN — P14d Mastra Score Receipt Trust Basis Readiness Freeze (Q2 2026) — semantic freeze that keeps Mastra score receipts importer-only until any score-derived Trust Basis claim boundary is explicitly accepted
- PLAN — P15 x402 Requirement / Verification Evidence Interop (Q2 2026) — planned requirement-and-verification-first x402 lane built around
PaymentRequiredplusVerifyResponse, not settlement or fulfillment truth - PLAN — P16 LiveKit Agents Testing-Result / RunEvent Evidence Interop (Q2 2026) — planned testing-result-first LiveKit lane built around
voice.testing.RunResult.events, not telemetry or transcript export - PLAN — P17 LlamaIndex EvaluationResult Evidence Interop (Q2 2026) — planned eval-result-first LlamaIndex lane built around bounded
EvaluationResultevidence, not traces or callback exports - PLAN — P18 Vercel AI SDK UIMessage Evidence Interop (Q2 2026) — planned message-first Vercel AI SDK lane built around bounded
UIMessageartifacts, with show-and-tell-first outward strategy rather than question-first - PLAN — P19 Mem0 Add Memories Result Evidence Interop (Q2 2026) — planned mutation-result-first Mem0 lane built around bounded
Add Memoriesresults, not retrieval or profile truth - PLAN — P20 AG-UI Compacted Message Snapshot Artifact Evidence Interop (Q2 2026) — planned compacted-message-history AG-UI lane built around one bounded run envelope and one
MESSAGES_SNAPSHOT, not general serialization or full stream fidelity - PLAN — P21 Stagehand Observe-Derived Selector-Scoped Extract Artifact Evidence Interop (Q2 2026) — planned selector-scoped Stagehand lane built around one observe-derived selector anchor plus one scoped extract result, not broad browser-agent support or snapshot truth
- PLAN — P22 OpenAI Agents JS Tool Approval Interruption / Resumable-State Evidence Interop (Q2 2026) — planned paused-run OpenAI Agents JS lane built around bounded
interruptionsplus one resumable continuation anchor, not transcript, session, or provider-chaining truth - PLAN — P23B Assay Paused Human-in-the-Loop Evidence Pattern (Q2 2026) — planned Assay-side reference pattern for bounded paused HITL evidence, standardizing
pause_reason,interruptions,call_id_ref, and derivedresume_state_refwithout importing transcript, session, or full serialized-state truth - PLAN — P24 Phoenix Span Annotation Evaluation-Signal Evidence Interop (Q2 2026) — planned annotation-first Phoenix lane built around one bounded span annotation artifact, not trace, experiment, evaluator, or platform truth
- PLAN — P25 LangWatch Custom Span Evaluation Signal Evidence Interop (Q2 2026) — planned custom-evaluation-first LangWatch lane built around one bounded span-linked evaluation signal, not trace, dataset, evaluation-session, or platform truth
- PLAN — P26 AgentEvals Trajectory Strict-Match Result Signal Evidence (Q2 2026) — planned strict-match-first AgentEvals lane built around one returned deterministic trajectory match result, not LangSmith runs, LLM-as-judge outputs, or raw trajectory truth
- PLAN — P27 AutoEvals ExactMatch Score Evidence (Q2 2026) — planned ExactMatch-first AutoEvals lane built around one returned deterministic output/expected comparison score, not Braintrust runs, JSON/list scorer bundles, LLM judge outputs, or raw payload truth
- PLAN — P28 Promptfoo Assertion GradingResult Evidence (Q2 2026) — planned deterministic-assertion-first Promptfoo lane built around one surfaced
GradingResult, not full eval exports, prompt matrices, red-team reports, or raw provider output truth - PLAN — P29 Guardrails Validation Outcome Evidence (Q2 2026) — planned outcome-first Guardrails AI lane built around one bounded validation outcome, not prompt, corrected-output, reask, or guard-history truth
- PLAN — P30 OpenFeature EvaluationDetails Evidence (Q2 2026) — planned governance-adjacent OpenFeature lane built around one returned
EvaluationDetailsobject, not provider config, targeting, rollout, telemetry, or application correctness truth - PLAN — P31 Promptfoo JSONL Component Result Receipt Import (Q2 2026) — planned compiler-path follow-up to P28 that imports one Promptfoo JSONL assertion component result into one portable Assay evidence receipt, not full Promptfoo eval-run truth or Harness regression gating
- PLAN — P32 Promptfoo Receipt Trust Basis Readiness (Q2 2026) — execution slice that proves P31 receipt bundles feed the current Trust Basis compiler without adding a Promptfoo-specific claim row or Trust Card schema bump
- PLAN — P33 External Eval Receipt Trust Basis Claim (Q2 2026) — execution slice that adds one bounded Trust Basis claim for supported external evaluation receipt boundaries, starting with Promptfoo assertion-component receipts
- PLAN — P34 Trust Basis Diff Gate (Q2 2026) — execution slice that compares canonical Trust Basis artifacts for claim-level regressions without parsing Promptfoo JSONL or external eval payloads
- PLAN — P41 OpenFeature EvaluationDetails Decision Receipt Import (Q2 2026) — execution slice that imports bounded boolean OpenFeature decision details as portable Assay receipts, not provider config, targeting, metadata, or application correctness truth
- PLAN — P43 CycloneDX ML-BOM Model Component Receipt Import (Q2 2026) — execution slice that imports one selected CycloneDX
machine-learning-modelcomponent as a portable inventory receipt, not full BOM, model-card, dataset, graph, or compliance truth - PLAN — P45 Inventory Receipt Trust Basis Claim (Q2 2026) — execution slice that adds one bounded Trust Basis claim for supported inventory receipt boundaries, starting with CycloneDX ML-BOM model-component receipts
- PLAN — P45b Decision Receipt Trust Basis Claim (Q2 2026) — execution slice that adds one bounded Trust Basis claim for supported decision receipt boundaries, starting with OpenFeature boolean EvaluationDetails receipts
- PLAN — P52-P56 Assay Product Surface Consolidation Program (Q2 2026) — implemented post-v3.8.0 consolidation program for product truth sync, Trust Basis assertions, receipt schema CLI, static Trust Card HTML, and policy/tool digest binding
- PLAN — P56a Policy Snapshot Digest Visibility (Q2 2026) — implemented slice that projects canonical policy snapshot digest metadata onto supported MCP decision evidence without claiming policy correctness
- PLAN — P56b Tool Definition Digest Binding (Q2 2026) — implemented companion slice that binds supported MCP decision evidence to a bounded tool-definition digest without claiming tool safety, signature validity, or registry truth
- Assay Architecture & Roadmap Gap Analysis (Q2 2026) — repo-wide truth sync and next-step ordering
Active RFCs¶
| RFC | Status | Summary |
|---|---|---|
| RFC-001: DX/UX & Governance | Historical (Wave A/B delivered; Wave C remains data-gated) | Normative DX/refactor invariants and historical execution framing |
| RFC-002: Code Health Remediation | Complete (E1–E4 merged, E5→RFC-003) | Store, metrics, registry, comment cleanup |
| RFC-003: Generate Decomposition | Complete (G1–G6 merged) | generate.rs split into focused modules |
| RFC-004: Open Items Convergence | Closed (O1–O6 merged on main) | Historical closure ledger for the Q1 convergence line |
| RFC-005: Trust Compiler MVP | Active (T1a..H1 public in v3.3.0; G4-A, P2c, and K1-A public in v3.4.0; K2-A Phase 1 is now public in v3.5.0) | Bounded plan for the trust-compiler and Trust Card line |
Architecture Decision Records¶
See the full ADR index for all accepted and proposed architecture decisions.
Key ADRs: - ADR-003: Gate Semantics — Pass/Fail/Warn/Flaky - ADR-006: Evidence Contract — schema v1 - ADR-014: GitHub Action v2 — CI integration - ADR-015: BYOS Strategy — bring your own storage - ADR-032: MCP Policy Enforcement v2 — typed decisions + obligations + evidence - ADR-033: Trust Compiler Positioning — claims-as-code north star and Trust Card wedge
Reference¶
- Code Analysis Report — finding snapshot (remediation tracked in RFCs)
- Assay Architecture & Roadmap Gap Analysis — repo-wide truth sync across architecture and roadmap
- Pipeline Decomposition Plan — run/ci shared pipeline design