Skip to content

PLAN — P33 External Eval Receipt Trust Basis Claim (Q2 2026)

  • Date: 2026-04-27
  • Owner: Evidence / Trust Compiler
  • Status: Execution slice
  • Scope: Add one bounded Trust Basis claim for supported external evaluation receipt evidence, starting with the P31 Promptfoo assertion-component receipt event.

1. Why this exists

P31 made the Promptfoo compiler path real:

Promptfoo CLI JSONL component result
  -> assay evidence import promptfoo-jsonl
  -> Assay EvidenceEvent receipt bundle

P32 then proved those bundles are not dead-end files:

receipt bundle
  -> assay evidence verify
  -> assay trust-basis generate

P33 is the next deliberate compatibility step. It adds one Trust Basis row that lets reviewers see that a supported external evaluation receipt boundary is present in the verified bundle.

2. What P33 is

P33 adds:

  • external_eval_receipt_boundary_visible
  • source = external_evidence_receipt
  • boundary = supported-external-eval-receipt-events-only
  • Trust Card schema 3, because the visible claim table changes

The claim is verified only when the bundle contains a supported receipt event whose payload still matches the bounded v1 receipt shape.

For the first slice, the only supported event is:

assay.receipt.promptfoo.assertion_component.v1

with:

  • schema = "assay.receipt.promptfoo.assertion-component.v1"
  • source_system = "promptfoo"
  • source_surface = "cli-jsonl.gradingResult.componentResults"
  • reviewer-safe source artifact ref and digest
  • reducer version
  • UTC RFC3339 import timestamp
  • assertion_type = "equals"
  • result.pass
  • binary result.score
  • optional bounded result.reason

3. What P33 is not

P33 does not claim:

  • Promptfoo run pass/fail truth
  • model-output correctness
  • prompt/output/expected-value truth
  • Promptfoo config, provider, token, cost, stats, or red-team truth
  • broad external-evaluation support
  • Harness baseline/candidate regression semantics

The claim means only:

the verified bundle contains at least one supported bounded external eval receipt

It does not mean:

the external evaluation result is correct or sufficient

4. Predicate rule

The Trust Basis predicate must stay stricter than generic event presence.

external_eval_receipt_boundary_visible = verified requires:

  • supported receipt event type
  • exact supported source system and source surface
  • digest-shaped source artifact binding
  • UTC RFC3339 import timestamp
  • supported reducer-version prefix
  • supported assertion type
  • bounded result object
  • no top-level raw Promptfoo payload fields in the receipt payload

Malformed, wider, or future-shaped receipt payloads remain accepted by evidence verify if the bundle contract allows them, but this Trust Basis claim should stay absent until the predicate is deliberately widened.

5. Trust Card impact

Adding a claim row changes the Trust Card visible surface. P33 therefore bumps:

TRUST_CARD_SCHEMA_VERSION = 3

The Trust Card remains a deterministic render of Trust Basis. It does not add a second classifier, summary prose, aggregate score, or badge.

6. Acceptance criteria

  • Trust Basis always emits the new claim row.
  • Ordinary bundles keep the claim absent.
  • Supported P31 Promptfoo receipt bundles classify it as verified.
  • Receipt-like events that include raw Promptfoo payload fields classify it as absent.
  • Trust Card schema is bumped to 3.
  • Trust Card JSON and Markdown still render only the same claim rows plus frozen non-goals.
  • CLI docs explain the claim boundary without describing Promptfoo correctness or run success.

7. Sequencing

P33 comes after P32 and before any Harness compare work.

The next likely slice after P33 is Harness-level comparison over compiled receipts. Harness should compare Assay receipt/trust artifacts; it should not learn Promptfoo JSONL parsing semantics directly.