Skip to content

Performance Budgets (Wave C Harness)

This document defines the reproducible workload classes and baseline budgets used to gate Wave C optimization work.

Workload Classes

Class Bundle Size Target Event Count Rule Count Target Usage
small ~1 MB 1k ~10 Fast local smoke/perf sanity
typical-pr ~10 MB 10k ~50 Default CI-level perf guardrail
large 50 MB+ 100k+ 500+ Scale trigger for C1/C3/C4

Bundle size targets are logical payload targets (uncompressed event content). The harness uses deterministic low-compressibility payloads so compressed tar sizes do not collapse unrealistically.

Harness Commands

Default (small + typical-pr):

cargo bench -p assay-evidence --bench verify_lint_harness

Single class (example: large):

ASSAY_PERF_WORKLOAD=large cargo bench -p assay-evidence --bench verify_lint_harness

All classes:

ASSAY_PERF_WORKLOAD=small,typical-pr,large cargo bench -p assay-evidence --bench verify_lint_harness

Profile-store harness (C3):

cargo bench -p assay-cli --bench profile_store_harness

Profile-store single class:

ASSAY_PROFILE_PERF_WORKLOAD=large cargo bench -p assay-cli --bench profile_store_harness

Trigger Budgets (Ubuntu Baseline)

These are trigger thresholds, not pass/fail release gates.

The harness emits verify/*, lint/*, and verify+lint/* series per workload. Trigger checks for C1 must use the explicit verify+lint/* series from the same Criterion run.

Measurement protocol (to keep comparisons stable): - Runner: ubuntu-latest as baseline. - Percentiles: use both p50 and p95. - Warm/cold split: - cold = first run after clean build/artifact state - warm = repeated runs on same runner/workdir - trigger decisions use warm p95 and cold p50 together when relevant.

  • C1 trigger:
  • verify+lint p95 > 5s on large
  • or verify+lint p50 > 2s on typical-pr
  • C2 trigger:
  • runner clone/build overhead > 10% of suite runtime on >=1000 tests
  • C3 trigger:
  • profile merge p95 > 1s at >=10k entries (profile/merge/typical-pr or higher)
  • or profile load p95 > 500ms (profile/load/typical-pr or higher)
  • C4 trigger:
  • run-id tracking evictions cause determinism or duplicate-merge issues
  • hard bound for duplicate protection window: N = 5000 recent run IDs

Guardrails

  • No semantic changes to verify/lint/run outputs in Wave C.
  • Any optimization PR must include before/after benchmark output from this harness.
  • Golden equivalence tests are required for verify/lint behavior changes.