ADR-033: Assay as an OTel-Native Trust Compiler for Agent Systems¶

Status¶

Accepted (March 2026)

Context¶

Assay's strongest 2026 delivery line is no longer "more agent tooling breadth." It is a sequence of small, bounded claim and evidence moves:

C2 shipped a narrow, honest control-evidence baseline instead of a broad OWASP story.
E1 added a minimal typed engine seam rather than a wider policy language.
G1 surfaced supported weaker-than-requested containment fallback paths as evidence.
G2 surfaced explicit delegation context on supported decision evidence.
P1 productized those signals as a signal-aware companion pack without broadening the baseline.
the only post-P1 release-line mismatch was closed on main by the 3.2.3 workspace bump and aligned OWASP pack version floors.

At the same time, the external line is moving toward practical identity/authz metadata, auditability, and protocol-level measurable defenses:

OWASP MCP Top 10 emphasizes authorization and audit/telemetry as distinct control layers.
OWASP Top 10 for Agentic Applications 2026 keeps identity/privilege abuse and execution risk as first-order categories.
NIST NCCoE and CAISI work on software/AI agent identity and authorization centers explicit metadata and bounded controls.
Protocol-aware benchmarks such as A2ASecBench and MCP-SafetyBench reinforce that the frontier is in verifiable protocol/runtime claims, not generic prompt safety or dashboard breadth.

Repo truth already supports this direction:

the evidence pipeline follows the OTel Collector pattern
assay trace ingest-otel already exists on main
evidence bundles, verification, signing, and proof-bearing packs are all implemented
trust-chain experiments on main already reason about provenance, delegation spoofing, and consumer-side evidence interpretation

This means Assay is best positioned not as "another eval platform" or "another observability dashboard," but as the system that compiles runtime truth into verifiable security claims.

Strategic Fit Test¶

This direction is considered strategically sound only while it passes all three tests below.

1. External Demand Fit¶

The strongest external demand in 2026 is not generic agent analytics. It is practical control surfaces around:

identity and authorization
audit and telemetry
protocol-level security posture
bounded, reviewable deployment claims

That makes Assay a better fit for a trust-compiler category than for a broad observability or eval category.

2. Repo Capability Fit¶

Assay already has the substrate this direction requires:

canonical evidence and offline verification
OTel-style ingest and transformation
proof-bearing bundles and signing surfaces
bounded signal waves for containment degradation and delegation visibility
signal-aware companion packs

This means the trust-compiler direction is a composition of real shipped capabilities, not a jump into a new product genus.

3. Wedge Fit Against Alternatives¶

The main alternatives are:

broader pack expansion
another engine/semantics wave
dashboards / observability UX
generic eval or red-team positioning

Those may be easier to explain, but they are weaker wedges for Assay. The stronger wedge is to make claim provenance and evidence status portable and explicit through a Trust Card and associated claim surfaces.

Decision¶

Assay is positioned as an OTel-native trust compiler for agent systems.

Trust compiler describes the product category; OTel-native describes the preferred ingest and ecosystem posture.

The product model is:

Input: OTel spans, protocol/runtime events, Assay traces, and bundle artifacts
Compile: canonical evidence, bounded claim classification, and pack evaluation
Output: findings, SARIF, verifiable bundles, and a future signed Trust Card

OTel-native is a direction for ingress and ecosystem fit, not a surrender of semantic control. Assay's own canonical evidence layer remains the stable source of truth for trust claims. OTel semantic conventions may evolve, and Assay should ingest and map them, not couple its truth model to any single moving semconv shape. Claims are classified on canonical evidence, not directly on raw OTel spans or other upstream ingest formats.

North Star Freeze¶

The following constraints are normative for roadmap and product decisions unless a later ADR explicitly supersedes them:

Claim-first, not dashboard-first Assay's primary product surface is evidence-classified trust claims. Dashboards, trace browsers, and visual analytics are supporting surfaces, not the wedge.
Canonical evidence over ingest format OTel, protocol adapters, and other sources are ingest paths. Trust claims must be grounded in Assay's canonical evidence contract and offline-verifiable bundle reality.

Operational rule: new ingest paths may be additive or translational, but they must not replace the canonical evidence layer as the semantic authority for claim classification. Any upstream OTel or protocol mapping change that could affect claim semantics must be covered by canonical evidence mapping tests before adoption.

Trust Card over trust score The iconic artifact is a Trust Card that shows what is verified, self_reported, inferred, or absent. A scalar trust score or binary trusted/untrusted output must not become the primary interface.

MVP rule: no aggregate trust score, no safe/unsafe badge, and no maturity badge as the primary artifact.

Fixed execution order The default execution order is T1a -> T1b -> G3 -> P2, then only later heavier semantics such as reference existence, temporal validity, or capability attestation, unless a later ADR explicitly supersedes it.
No premature correctness claims Delegation validation, chain integrity/completeness, sandbox correctness, inherited-scope correctness, and temporal correctness remain out of scope until dedicated signals and semantics exist.

Claim Epistemology Is A First-Class Product Surface¶

Assay differentiates by making the evidence level of a claim explicit, rather than by maximizing raw detection counts.

The primary evidence levels are:

Level	Meaning
`verified`	Backed by direct runtime evidence or offline bundle verification
`self_reported`	Reported by the observed system without stronger corroboration
`inferred`	Derived by bounded, documented interpretation rules
`absent`	No trustworthy evidence currently supports the claim

These evidence levels are the preferred external framing for future trust artifacts. Assay should not collapse them into a primary opaque trust score.

Adjacent Models We Borrow From, And What We Reject¶

Assay does not invent this direction from nothing, but it also does not fit neatly into any one existing category.

from SLSA / in-toto style attestations, Assay borrows machine-readable, signable claim discipline
from GUAC-style metadata synthesis, Assay borrows ingest -> normalize -> synthesize separation
from AIBOM and card-style transparency artifacts, Assay borrows portable, reviewable output
from OTel, Assay borrows ingest and ecosystem fit

Assay explicitly does not copy the hard provenance assumptions of supply-chain attestations into runtime claims, does not become a graph-first metadata lake before a bounded Trust Card exists, and does not adopt score-first output as the primary product surface.

In practice, this means Assay should borrow the attestation model, not provenance hardness; borrow the compiler pattern, not the graph as the product; borrow the card metaphor, not self-reported capability theater; and borrow OTel interoperability, not upstream semantic authority.

Trust Card Is The First Iconic Artifact¶

The first product artifact of this compiler direction is a Trust Card:

portable
machine-readable
reviewable by humans
potentially signable / attestable later

The Trust Card is an output of the compiler, not a separate dashboard product. The Trust Card is a portable manifestation of compiler output, not the full product category.

Protocol Claim Packs Are The Preferred Downstream Productization Path¶

After the compiler and Trust Card surfaces stabilize, Assay should extend via small protocol claim packs, not via broad compliance theater.

Examples:

delegated authority context surfaced
weaker-than-requested containment surfaced
provenance-backed vs provenance-absent distinguished
capability overclaim detected

Deliberate Non-Plays¶

This direction explicitly rejects:

becoming a tracing platform
becoming a general observability dashboard
becoming eval-as-a-service
becoming a generic red-team framework
shipping a delegation-validation or sandbox-correctness story that the signals do not support
using an opaque scalar trust score as the primary product output
binding Assay's truth model to any one evolving OTel/agent semconv form

Main Risks And Mitigations¶

Risk 1 — Abstract Product Story¶

trust compiler is more abstract than evals, guardrails, or observability.

Mitigation:

keep the first artifact concrete: Trust Card
keep the claim levels explicit and simple
tie the story to CI, release governance, procurement review, and vendor comparison

Risk 2 — Category Confusion¶

If Assay presents itself as a tracing platform, dashboard, firewall, or generic eval suite, it competes in denser categories with a weaker wedge.

Mitigation:

keep the north star claim-first
treat dashboards and visual analytics as supporting surfaces only
keep protocol claim packs and Trust Card artifacts as the visible outputs

Risk 3 — Standards Churn¶

OTel GenAI and agent semantic conventions are still evolving, and protocol extensions will continue to move.

Mitigation:

keep Assay's canonical evidence contract as the truth layer
ingest and map OTel/protocol forms into that layer
avoid coupling primary trust semantics to any single moving upstream semconv

Consequences¶

Positive¶

Assay's moat becomes clearer and more defensible: trace -> evidence -> claim -> proof.
The product aligns better with the strongest parts of the existing architecture: deterministic evidence, pack discipline, offline verification, and OTel-friendly ingestion.
Trust artifacts can become portable CI/CD, audit, and procurement objects rather than dashboard-only screenshots.
Future signal waves such as authorization context fit naturally into the compiler story.

Negative¶

Assay intentionally does less in categories where competitors are already strong, such as experiments, dashboards, or generic eval UX.
Claim discipline must remain strict; overclaiming would undermine the entire positioning.
The first deliverables need careful wording so the compiler story does not sound like a full identity-validation or protocol-verification engine.
This positioning is less immediately legible than dashboard/eval categories and therefore depends on concrete artifacts and examples to remain understandable.

Neutral¶

Existing evidence, pack, and verification surfaces remain valid. This ADR changes product posture and next-step ordering more than it changes the core architecture.

Immediate Follow-On Sequence¶

T1a — OTel-native Trust Compiler MVP
T1b — Trust Card MVP
G3 — Authorization Evidence Signal
P2 — Protocol Claim Packs
Later: reference existence, temporal validity, capability attestation, and richer compliance packs

Any proposal that primarily improves dashboards, generic observability UX, or score-first reporting should be considered out-of-lane until this sequence is materially complete.

ADR-033: Assay as an OTel-Native Trust Compiler for Agent Systems¶

Status¶

Context¶

Strategic Fit Test¶

1. External Demand Fit¶

2. Repo Capability Fit¶

3. Wedge Fit Against Alternatives¶

Decision¶

North Star Freeze¶

Claim Epistemology Is A First-Class Product Surface¶

Adjacent Models We Borrow From, And What We Reject¶

Trust Card Is The First Iconic Artifact¶

Protocol Claim Packs Are The Preferred Downstream Productization Path¶

Deliberate Non-Plays¶

Main Risks And Mitigations¶

Risk 1 — Abstract Product Story¶

Risk 2 — Category Confusion¶

Risk 3 — Standards Churn¶

Consequences¶

Positive¶

Negative¶

Neutral¶

Immediate Follow-On Sequence¶

References¶