assay run¶
Execute a test suite against traces and write run artifacts.
Synopsis¶
Common Options¶
| Option | Description |
|---|---|
--config <PATH> | Config file (default: eval.yaml) |
--db <PATH> | SQLite DB path (default: .eval/eval.db) |
--trace-file <PATH> | Trace file source for replay/validation |
--strict | Treat blocking results as failing exit status |
--replay-strict | Enforce strict replay semantics from trace input |
--baseline <PATH> | Compare against existing baseline |
--export-baseline <PATH> | Export baseline from current run |
--no-cache | Disable cache usage for this run |
--refresh-cache | Ignore incremental cache and re-run |
--incremental | Skip passing tests with unchanged fingerprints |
--rerun-failures <N> | Retry failed tests up to N times |
--exit-codes <v1\|v2> | Exit-code compatibility mode (default: v2) |
Judge-related options are available via --judge, --judge-model, --judge-samples, etc.
Examples¶
# Basic run
assay run --config eval.yaml --trace-file traces/golden.jsonl
# Strict CI-style run
assay run --config eval.yaml --trace-file traces/golden.jsonl --strict --db :memory:
# Baseline check
assay run --config eval.yaml --trace-file traces/golden.jsonl --baseline assay-baseline.json
# Export baseline
assay run --config eval.yaml --trace-file traces/golden.jsonl --export-baseline assay-baseline.json
For dedicated CI report files (SARIF/JUnit/PR comment), use assay ci:
assay ci \
--config eval.yaml \
--trace-file traces/golden.jsonl \
--sarif .assay/reports/sarif.json \
--junit .assay/reports/junit.xml
Outputs¶
assay run writes: - run.json (exit/status/reason metadata) - summary.json (machine-readable summary including seeds and optional judge metrics) - Console summary + footer
Exit Codes¶
| Code | Meaning |
|---|---|
0 | Success |
1 | Test failure / policy failure |
2 | Configuration or input error |
3 | Infrastructure/judge/provider error |
4 | Would block (sandbox/policy) |
For automation, branch on reason_code + reason_code_version in run.json / summary.json.