Skip to content

Evidence Receipts from Promptfoo JSONL

Use this if Promptfoo already runs in CI and you want a smaller reviewable artifact than a full JSONL row.

Assay does not replace Promptfoo. Promptfoo runs the assertions and writes the JSONL output. Assay reduces selected assertion component results into bounded evidence receipts, bundles them, verifies the bundle, and lets CI gate the Trust Basis diff above that bundle.

Problem

A Promptfoo CI run can tell you whether an eval passed. Later review often needs a smaller question:

Which eval outcome was selected, what source artifact did it come from, and
can that boundary be reviewed without importing the full Promptfoo run?

That is the receipt boundary. It is useful for pull request review, incident follow-up, and audit trails where the reviewer should not need raw prompts, model outputs, vars, provider metadata, or a full eval dashboard.

One Workflow

First write Promptfoo JSONL:

promptfoo eval --output results.jsonl

Then import the supported assertion component results into an Assay evidence bundle:

assay evidence import promptfoo-jsonl \
  --input results.jsonl \
  --bundle-out promptfoo-evidence.tar.gz \
  --source-artifact-ref results.jsonl

Verify the bundle and compile the claim artifact:

assay evidence verify promptfoo-evidence.tar.gz
assay trust-basis generate promptfoo-evidence.tar.gz \
  --out promptfoo.trust-basis.json

Compare the candidate Trust Basis against a baseline:

assay trust-basis diff \
  baseline.trust-basis.json \
  promptfoo.trust-basis.json \
  --format json \
  --fail-on-regression

In CI, the baseline Trust Basis artifact usually comes from the default branch or a previously approved run.

Harness owns orchestration, exit codes, Markdown, and JUnit projection. The released recipe is here:

Canonical Artifact

The smallest source shape is one Promptfoo CLI JSONL row with gradingResult.componentResults[]:

{
  "gradingResult": {
    "componentResults": [
      {
        "pass": true,
        "score": 1,
        "reason": "Assertion passed",
        "assertion": {
          "type": "equals",
          "value": "expected-output-ref:checkout-greeting"
        }
      }
    ]
  }
}

The current receipt lane is intentionally strict. It imports selected equals assertion component results with binary scores only. The receipt keeps the bounded result and a digest of the source artifact, not the full Promptfoo row.

Proof artifacts are checked in under the Evidence Receipts in Action assets:

Artifact Role
candidate.results.jsonl Tiny Promptfoo source artifact
evidence.tar.gz Verifiable Assay receipt bundle
trust-basis.json Canonical claim artifact
trust-basis.diff.json Canonical CI diff artifact
trust-basis-summary.md Markdown reviewer projection
junit-trust-basis.xml JUnit CI projection

Boundary

Assay may claim that a supported external eval receipt boundary is visible:

{
  "id": "external_eval_receipt_boundary_visible",
  "level": "verified",
  "source": "external_evidence_receipt",
  "boundary": "supported-external-eval-receipt-events-only"
}

That claim means the selected Promptfoo outcome was reduced into a supported receipt shape, carried through a verifiable bundle, and compiled into a Trust Basis artifact.

It does not mean Assay owns Promptfoo semantics.

Not Claimed

This path does not claim:

  • the Promptfoo run passed
  • the model output was correct
  • the assertion was well designed
  • the eval set was complete
  • the application is safe
  • the full Promptfoo export is Assay truth

The claim is about a reviewable evidence boundary, not eval correctness.

Payoff Preview

The gate projection is intentionally small:

Trust Basis Gate
Status: OK
Regressed claims: 0
Removed claims: 0
Unchanged claims: 10

The raw JSON diff remains the canonical CI artifact. Markdown and JUnit are review projections only.

When to Use This

Use this path when:

  • Promptfoo already runs in CI
  • reviewers need a portable artifact, not only a pass/fail line
  • you want Trust Basis diffs and Harness gates above selected eval outcomes
  • raw prompts, outputs, vars, and provider responses should stay out of the receipt boundary

For the longer technical explanation, see From Promptfoo JSONL to Evidence Receipts. For the three-family static proof page, see Evidence Receipts in Action.