ADR-002: Trace Replay as Input Adapter¶

Status¶

Accepted

Context¶

Live LLM calls in CI/CD are problematic due to cost, nondeterminism and latency. We need to run the exact same evaluation logic against recorded interactions.

Decision¶

We implement a Trace Replay mode where assay accepts a trace file (JSONL) as the backend instead of a live provider.

1. Contract & Schema¶

The trace file MUST be JSONL. Each line MUST be a valid JSON object conforming to Trace Schema v1:

{
  "schema_version": 1,
  "type": "assay.trace",
  "request_id": "String (Optional) - Stable unique id",
  "prompt": "String (Required)",
  "context": ["String (Optional) - RAG context chunks"],
  "response": "String (Required)",
  "model": "String (Optional)",
  "provider": "String (Optional)",
  "meta": "Object (Optional)"
}

Validation Rules: - Schema Version: If present, must be 1. - Type: If present, must be assay.trace. - Content: One of text or response is REQUIRED. Empty strings break the contract if implied as successful response.

Matching & Uniqueness: - Lookup: Traces are indexed by prompt to support the current eval.yaml contract. - Uniqueness: - If request_id is present, it MUST be unique across the file. - The prompt MUST also be unique across the file to ensure deterministic lookup. (Ambiguous prompts = Error).

2. Privacy & Redaction¶

Traces can contain PII. - Default: Prompts are kept for debugging. - Redaction: When --redact-prompts is set, prompt text MUST be replaced with [REDACTED] in all outputs.

3. CI Workflow¶

Recommended workflow: 1. Dev/Staging: record fresh traces. 2. Store: commit sanitized traces. 3. PR Gate: assay ci --trace-file traces.jsonl. 4. Drift Mitigation: periodic re-record jobs.