Skip to content

assay run

Execute a test suite against traces and write run artifacts.


Synopsis

assay run [OPTIONS]

Common Options

Option Description
--config <PATH> Config file (default: eval.yaml)
--db <PATH> SQLite DB path (default: .eval/eval.db)
--trace-file <PATH> Trace file source for replay/validation
--strict Treat blocking results as failing exit status
--replay-strict Enforce strict replay semantics from trace input
--baseline <PATH> Compare against existing baseline
--export-baseline <PATH> Export baseline from current run
--no-cache Disable cache usage for this run
--refresh-cache Ignore incremental cache and re-run
--incremental Skip passing tests with unchanged fingerprints
--rerun-failures <N> Retry failed tests up to N times
--exit-codes <v1\|v2> Exit-code compatibility mode (default: v2)

Judge-related options are available via --judge, --judge-model, --judge-samples, etc.


Examples

# Basic run
assay run --config eval.yaml --trace-file traces/golden.jsonl

# Strict CI-style run
assay run --config eval.yaml --trace-file traces/golden.jsonl --strict --db :memory:

# Baseline check
assay run --config eval.yaml --trace-file traces/golden.jsonl --baseline assay-baseline.json

# Export baseline
assay run --config eval.yaml --trace-file traces/golden.jsonl --export-baseline assay-baseline.json

For dedicated CI report files (SARIF/JUnit/PR comment), use assay ci:

assay ci \
  --config eval.yaml \
  --trace-file traces/golden.jsonl \
  --sarif .assay/reports/sarif.json \
  --junit .assay/reports/junit.xml

Outputs

assay run writes: - run.json (exit/status/reason metadata) - summary.json (machine-readable summary including seeds and optional judge metrics) - Console summary + footer


Exit Codes

Code Meaning
0 Success
1 Test failure / policy failure
2 Configuration or input error
3 Infrastructure/judge/provider error
4 Would block (sandbox/policy)

For automation, branch on reason_code + reason_code_version in run.json / summary.json.


See Also