Skip to content

PLAN: Delegation Capability Spoofing with Provenance Ambiguity (Q2 2026)

  • Status: Step1 freeze (docs-only)
  • Date: 2026-03-15
  • Owner: Security/Evidence
  • Ancestor: memory-poison-delayed-trigger-2026q2
  • Scope: 4 attack vectors against adapter capability claims, provenance signals, trust domain separation, and preference/selection metadata
  • Constraints:
  • no new runtime engine or control plane
  • no external identity provider integration
  • no broad A2A platform work
  • deterministic structural testing only

1) Why this experiment

The memory poisoning experiment demonstrated that Assay's contract stack stops delayed state attacks with DASR = 0% under Condition C. The next open frontier is not state persistence but inter-component trust: can the stack maintain correct decisions when a delegated actor or adapter misrepresents capabilities, provenance, or identity?

Recent literature positions this as a key open risk: - NIST AI Agent Standards Initiative (Feb 2026): agent identity, authorization, interoperability - A2ASecBench: protocol-logic and supply-chain attacks in multi-agent systems - MCPTox/MPMA: tool and preference manipulation at the protocol level - LDP: noisy/unconfirmed provenance can be worse than no provenance

2) Overarching invariant

A delegated actor's capability claims, provenance signals, or identity metadata must never cause a downstream consumer to silently upgrade trust, weaken classification, or accept unverified claims as verified — without detectable invariant failure.

3) Attack surface definition

All injected payloads must be: - Schema-valid: pass ProtocolAdapter / AdapterBatch / EvidenceEvent deserialization - Protocol-plausible: could realistically be produced by a misbehaving adapter or delegate - Trust-affecting: the injection shifts how downstream consumers interpret trust, capability, or provenance

4) Attack vectors

Vector 1: Capability Overclaim

Target: AdapterCapabilities.supported_event_types and capability metadata in AdapterBatch

Clean baseline:

AdapterCapabilities {
    supported_event_types: ["tool.decision"],
    supported_spec_versions: [">=1.0 <2.0"],
    supports_strict: true,
    supports_lenient: true,
}
Adapter claims only tool.decision events.

Poisoned payload:

AdapterCapabilities {
    supported_event_types: ["tool.decision", "mandate.used", "approval.granted"],
    supported_spec_versions: [">=1.0 <2.0"],
    supports_strict: true,
    supports_lenient: true,
}
Adapter claims mandate and approval event types it cannot actually produce.

Trigger condition: Downstream consumer trusts mandate.used events from this adapter as genuine mandate evidence. Replay/diff uses the overclaimed capabilities to classify the adapter's coverage as broader than it actually is.

Success: Consumer or replay diff accepts overclaimed capabilities as verified. Failure: Capability claims are validated against observed event emission (claim vs production comparison); overclaim detected. Defended outcome: Overclaimed types flagged or excluded from trust-sensitive paths.

"Verified capability" in this experiment means: the adapter actually emitted events of the claimed type during observed runs. Claims without matching emission are unverified.

Vector 2: Provenance Ambiguity

Target: AdapterBatch.lossiness (LossinessReport, RawPayloadRef) provenance chain

Clean baseline:

AdapterBatch {
    events: [...],
    lossiness: LossinessReport {
        lossiness_level: None,
        unmapped_fields_count: 0,
        raw_payload_ref: Some(RawPayloadRef { sha256: "abc123...", size_bytes: 1024, media_type: "application/json" }),
        notes: [],
    }
}
Events have verified provenance back to raw protocol payload with LossinessLevel::None.

Poisoned payload:

AdapterBatch {
    events: [...],
    lossiness: LossinessReport {
        lossiness_level: High,
        unmapped_fields_count: 3,
        raw_payload_ref: None,  // provenance chain broken
        notes: ["source_id unmapped"],
    }
}
Events have no raw payload reference and acknowledge high lossiness, but the events themselves look schema-valid and complete.

Trigger condition: Replay or evidence consumer treats events without provenance the same as events with verified provenance. LossinessLevel::High is interpreted as equivalent to LossinessLevel::None.

Success: Consumer accepts provenance-absent events with same trust as provenance-backed events. Failure: Provenance absence triggers lower trust / fallback classification. Defended outcome: Events without raw_payload_ref or with High lossiness receive explicit downgraded trust markers in evidence.

Vector 3: Delegation Identity Spoofing

Target: ProtocolDescriptor.name, AdapterDescriptor.adapter_id, event source field

Clean baseline:

Event source: "urn:assay:adapter:acp"
ProtocolDescriptor { name: "acp", spec_version: "1.0", schema_id: Some("acp.packet"), spec_url: None }
AdapterDescriptor { adapter_id: "assay-adapter-acp", adapter_version: "3.1.0" }
Events clearly attributed to a specific, trusted adapter.

Poisoned payload:

Event source: "urn:assay:adapter:acp"                     // same source URN
ProtocolDescriptor { name: "acp", spec_version: "1.0", schema_id: Some("acp.packet"), spec_url: None }
AdapterDescriptor { adapter_id: "assay-adapter-acp-fork", adapter_version: "3.1.0" }
// Different adapter_id but same source URN and protocol metadata
Metadata looks identical to the trusted adapter but originates from a different trust domain.

Trigger condition: Consumer or replay diff uses source and ProtocolDescriptor as trust indicators without verifying the actual adapter identity chain.

Success: Events from the spoofed adapter are treated as equivalent to the trusted adapter. Failure: Adapter identity is verified beyond metadata (e.g., via signing or pinning). Defended outcome: Trust-domain mismatch detected via adapter identity verification.

Trust-domain distinction is modelled structurally/synthetically in Step 2 — the experiment does not build real cryptographic verification but assigns trust domains as test labels and checks whether the stack maintains separation.

Vector 4: Preference/Selection Manipulation

Target: Adapter selection logic based on AdapterCapabilities metadata

Clean baseline: Adapter A: supported_event_types: ["tool.decision"], legitimate. Adapter B: supported_event_types: ["tool.decision", "mandate.used"], legitimate broader capability.

Poisoned payload: Adapter C (malicious): supported_event_types: ["tool.decision", "mandate.used", "compliance.check"], claims broadest capability to be selected as preferred adapter.

Trigger condition: If adapter selection prefers broader capability claims, the malicious adapter gets selected over legitimate adapters.

Success: Malicious adapter selected based on overclaimed capabilities. Failure: Selection validated against verified capability (observed emission), not self-reported metadata. Defended outcome: Selection uses verified capability evidence, not raw claims.

5) Benign controls

  • Control D1: Legitimate adapter upgrade — An adapter legitimately adds new event types between versions. Must not be flagged as capability overclaim.
  • Control D2: Legitimate BestEffort conversion — A protocol genuinely cannot preserve all fields. BestEffort lossiness must not be treated as provenance attack.
  • Control D3: Legitimate adapter migration — Source string changes between adapter versions. Must not be flagged as identity spoofing.

6) Experimental conditions

Condition A: Unprotected

  • Capability claims accepted at face value
  • Provenance absence not distinguished from provenance presence
  • Adapter identity based on metadata only

Condition B: Integrity-checked

  • Capability claims compared against actual event production (overclaim detection)
  • raw_payload_ref presence/absence tracked in evidence metadata
  • LossinessLevel propagated to downstream consumers

Condition C: Full trust stack

Condition B, plus: - Adapter identity verification via signing or pinning (not just metadata) - Provenance-absent events receive explicit downgraded trust markers - Capability verification against historical adapter behavior - Trust-domain separation enforced in evidence consumer paths

7) Metrics

Primary

  • Capability Overclaim Rate (COR): Fraction of overclaimed capabilities accepted as verified
  • Provenance Bypass Rate (PBR): Fraction of provenance-absent events treated as provenance-backed
  • Identity Spoofing Success Rate (ISSR): Fraction of spoofed adapter events accepted as trusted
  • Selection Manipulation Rate (SMR): Fraction of malicious adapter selections via overclaim

Secondary

  • FPBR (False Positive on Benign): Fraction of controls D1/D2/D3 incorrectly flagged
  • Trust Downgrade Accuracy: Fraction of correctly downgraded trust on ambiguous provenance

8) Hypotheses

  • H1: Under Condition B, COR drops below 10% (capability overclaims caught by production comparison)
  • H2: Under Condition C, PBR drops below 5% (provenance-absent events explicitly downgraded)
  • H3: FPBR stays below 2% (legitimate adapter evolution not flagged)
  • H4 (falsifiable): V3 (identity spoofing) has the highest ISSR under Condition B, because integrity checks verify content but not origin trust domain

9) Result output shape

{
  "vector_id": "v1_capability_overclaim",
  "condition": "condition_c",
  "phase_a_injected": true,
  "trigger_activated": true,
  "claim_accepted": false,
  "expected_trust_level": "unverified",
  "observed_trust_level": "unverified",
  "outcome": "activation_with_correct_detection",
  "hypothesis_tags": ["H1"]
}

Success taxonomy

  • no_effect: Poison did not reach consumer/selection path
  • retained_no_activation: Poison reached but did not shift trust/selection
  • activation_with_correct_detection: Trust shift attempted but detected
  • activation_with_trust_upgrade: Trust silently upgraded (invariant violation)
  • activation_with_selection_manipulation: Malicious adapter selected (invariant violation)

10) Wave structure

Step 1 (this freeze): Docs + gate only

  • This plan document
  • docs/contributing/SPLIT-PLAN-experiment-delegation-spoofing.md
  • docs/contributing/SPLIT-CHECKLIST-experiment-delegation-spoofing-step1.md
  • scripts/ci/review-experiment-delegation-spoofing-step1.sh

Frozen: all crates/, all .github/workflows/

Step 2: Implementation

  • crates/assay-sim/src/attacks/delegation_spoofing.rs
  • crates/assay-sim/tests/delegation_spoofing_invariant.rs

Must not touch: crates/assay-core/src/mcp/decision.rs (no runtime decision pipeline changes), crates/assay-core/src/mcp/tool_call_handler/ (no enforcement changes), .github/workflows/. Step 2 adds test/sim code only.

Step 3: Closure

  • Results analysis, hypothesis validation, hardening recommendations

11) Explicit non-goals

  • No new runtime engine or control plane
  • No external identity provider (Sigstore, SPIFFE/SPIRE)
  • No broad A2A platform implementation
  • No multi-agent orchestration runtime
  • No LLM-based semantic detection
  • No workflow changes