CLI Command Grouping RFC¶
Status: accepted direction; Tier 1 MCP pilot and Tier 2a policy authoring implemented
Owner: CLI / Product
Last updated: 2026-06-01 (Tier 2a policy authoring implemented)
Summary¶
Assay should not do a big-bang command restructure. The current flat CLI is usable, and the high-frequency commands should stay flat. The useful next step is selective noun-verb grouping for families that already behave like resource groups:
mcpfirst (implemented as the Tier 1 pilot)trustsecond only after one more usage/docs check- a narrowed
policygrouping (generate/record, not the full set first drafted), now implemented as Tier 2a - a corrected
replaygrouping (notevidence) only if user feedback or nearby maintenance work justifies it
Tier 2 was revised after checking the real command tree: the earlier policy generate/coverage/explain/fix and evidence bundle/replay/import sketches hit name collisions (policy migrate, evidence import) and a miscategorization (replay bundles are not evidence bundles). See the Tier 2 sections for the corrected, smaller scope.
The migration contract copies the proven trustcard to trust-card pattern from #1454: new canonical spelling, old spelling kept as a hidden compatibility path, a stderr deprecation warning, tests for both paths, and no artifact/output-shape changes.
Why This Exists¶
The CLI has grown into a broad command surface. The quick UX fixes around help text, trace replay errors, positional validation config, run JSON output, and trust-card naming improved the immediate experience. What remains is not a bug; it is gradual discoverability erosion.
For humans and agents, a large flat command list is harder to explore. A selective noun-verb structure gives a predictable path:
That is easier to reason about than scanning many top-level peers. But over-grouping would make the most common paths worse, so this RFC keeps the main evaluation loop flat.
Goals¶
- Reduce CLI discovery cost for related command families.
- Preserve existing scripts through hidden compatibility paths.
- Keep high-frequency commands short and stable.
- Avoid artifact, schema, exit-code, stdout/stderr, and output-shape churn.
- Make each future grouping reviewable as one small family PR.
Non-Goals¶
- No big-bang 36-command restructure.
- No immediate code migration in this RFC.
- No removal of old command names before a future major release.
- No change to Trust Card artifact names such as
trustcard.json. - No forced noun-verb shape for universal commands like
run,doctor, orversion. - No attempt to minimize the top-level command count as an end in itself.
Current Shape¶
The current command surface is mixed: some nouns already exist, while several related actions remain flat.
| Domain | Current commands | Current shape |
|---|---|---|
| Core eval loop | run, ci, validate, watch | Flat |
| Scaffolding | init, init-ci, setup, demo | Flat |
| Policy authoring | policy, generate, record, coverage, explain, fix, migrate, calibrate | Mixed |
| Trust artifacts | trust-basis, trust-card, baseline | Flat |
| Evidence and replay | evidence, bundle, replay, import | Mixed |
| MCP runtime | mcp with hidden legacy shims for discover, kill, tool | Grouped |
| Runtime/security | monitor, sandbox, quarantine, sim | Mixed |
| Trace/profile data | trace, profile | Flat |
| Meta | doctor, version | Flat |
Proposed Direction¶
Keep Core Commands Flat¶
These commands should remain top-level:
assay runassay ciassay validateassay watchassay initassay doctorassay version
These are high-frequency or universal CLI verbs. Moving them under another noun would increase friction for the most common paths.
Tier 1: Group MCP¶
Target shape:
Why first:
discoverandkillare already MCP-specific by description and behavior.mcpalready exists as a hidden noun for wrapper work.- This improves agent-oriented help exploration without touching the core eval loop.
- The affected commands are lower-frequency than
run,validate, andci.
Migration rule:
- Keep
assay discover,assay kill, and any existing flat MCP spellings as hidden compatibility shims. - Emit a stderr deprecation warning when a legacy flat path is used.
- Do not change policy enforcement, output files, exit codes, or JSON shapes.
Status:
- Implemented for
discover,kill, andtoolas the first grouping pilot. assay mcp wrapandassay mcp config-pathremain in the same family.
Tier 1: Consider Trust After MCP¶
Target shape:
Why it is a candidate:
trust-basisandtrust-cardare one conceptual family.-
1454 already proved the command alias/deprecation pattern on this surface.¶
- Trust Basis output behavior must remain unchanged: stdout by default, or the caller-supplied
--outpath, commonly documented astrust-basis.json. - Trust Card artifact names must remain unchanged:
trustcard.json,trustcard.md, andtrustcard.html.
Why it should remain conditional:
trust-basisandtrust-cardmay already be clear enough as paired hyphenated top-level commands.- Before moving them, check docs, examples, scripts, and user-facing material for direct use of both command names.
- Only group them if the help/discovery gain is worth carrying two legacy compatibility paths.
Migration rule:
- Keep
assay trust-basisandassay trust-cardas hidden compatibility paths. - Emit a stderr deprecation warning when legacy paths are used.
- Keep Trust Basis output behavior and Trust Card artifact contracts unchanged.
Open question:
baselineshould stay flat unless future work shows it belongs undertrust. It is related to scoring baselines, not necessarily Trust Basis/Card artifacts.
Tier 2: Consider Policy Authoring (narrowed)¶
Revision note: an earlier draft of this section proposed
policy generate / coverage / explain / fix. Checking the real command tree narrowed that set:policyalready exposesvalidate,migrate, andfmt, and several proposed verbs either collide or do not belong underpolicy. The viable Tier 2a surface is smaller than first drafted.
Viable target shape:
assay policy generate # was: generate ("Learning Mode: Generate policy from trace")
assay policy record # was: record ("Learning Mode: Capture and Generate in one flow")
Status: implemented, then tightened. The grouped assay policy ... paths are canonical; the old top-level assay generate / assay record shims were subsequently removed.
Optional, weaker fit:
Do not move these under policy:
migrate— collides with the existingpolicy migrate("v1.x constraints to v2.0 schemas"). The top-levelmigratealso handles config formats, so this is a semantics-merge decision, not a grouping shim. Leave both as-is until someone deliberately unifies the two migrate behaviors.explain— explains a test result or trace decision, not policy. It does not belong underpolicy. Leave flat (or revisit under a different noun).fix— "apply supported automatic fixes" is broader than policy and may touch config or trace fixes. Leave flat until its scope is bounded.calibrate— calibrates scoring thresholds, not policy.
Why not first:
- Moving top-level commands into a subcommand needs shim commands, not just clap aliases.
- This creates docs and example churn.
- The old top-level verbs must be actively supported, warned, and tested for at least two minor releases.
Trigger to start:
- A future policy-authoring refactor that already touches
generate/record. - User confusion around policy authoring.
Tier 2: Group Replay (corrected — not under Evidence)¶
Revision note: an earlier draft proposed folding
bundle,replay, andimportunderevidence. Checking the real command tree showed this is wrong on two counts, so the target noun changed fromevidencetoreplay.
Why the earlier evidence plan does not work:
evidencealready exposesimport(evidence importfor CycloneDX, OpenFeature, Mastra, and Pydantic evidence). The top-levelimport("external artifacts into Assay-compatible data") is a different command, soimportcannot move underevidencewithout a name collision. Leave top-levelimportflat. Whether the two imports should be unified is a separate semantics question, not a grouping move.bundleandreplayoperate on replay bundles ("Create replay bundle from run artifacts"), not evidence bundles. Folding them underevidencewould conflate two distinct bundle concepts.
Corrected target shape — a dedicated replay noun:
assay replay bundle create # was: bundle create
assay replay bundle verify # was: bundle verify
assay replay run # was: replay (run a recorded replay bundle)
Extra care vs the MCP grouping:
- This promotes the existing top-level
replaycommand into areplaynoun.assay replay <bundle>must keep working as a shim that maps toassay replay run <bundle>. That command-to-noun promotion is slightly more involved than the MCP case (wherediscover/kill/toolwere already distinct commands) and deserves its own parse and behavior tests.
Trigger to start:
- A future replay or bundle UX pass that already touches this code.
- Repeated confusion between replay bundles and evidence bundles.
Migration Contract¶
Every future grouping PR should follow this contract:
- Add the new noun-verb path as canonical.
- Keep the old path working as a hidden compatibility path.
- Emit a concise deprecation warning to stderr on the old path.
- Add parse tests for both new and old paths.
- Add contract tests proving the old path still produces the same output.
- Do not rename artifacts, schemas, receipt types, exit codes, or output formats.
- Keep stdout behavior unchanged; warnings go to stderr only.
- Keep docs focused on the new canonical path.
- Keep the old path hidden from help output unless there is a deliberate visible deprecation reason.
- Leave historical architecture/RFC references alone unless they are actively misleading.
- Keep the compatibility path for at least two minor releases, and remove it only on a future major release.
Implementation Notes¶
For a rename at the same command level, a clap alias can be enough:
For a move from a flat command into a nested command, a clap alias is usually not enough. The old top-level path should become a shim command that delegates to the new handler and prints the deprecation warning.
That difference is why this RFC recommends starting with one family at a time.
Suggested Sequence¶
- Land this RFC as docs-only. Done.
- Land the small MCP-only grouping pilot. Done.
- If MCP grouping lands cleanly in a release, consider a trust grouping PR.
- Tier 2a
policy generate/policy recordis implemented. Keepcoverageflat for now, and do not movemigrate/explain/fix/calibrate. - Defer Tier 2b until there is user feedback or nearby replay maintenance: a
replaynoun (replay bundle,replay run). Notevidence. Leave top-levelimportflat (collides withevidence import). - Do not group core commands.
- Treat each Tier 2 family as its own PR; the collisions found while drafting Tier 2 are exactly why a bundled restructure is unsafe.
Review Checklist For Future Grouping PRs¶
- Does
assay --helpshow only the canonical new path? - Does the old path still execute successfully?
- Does the old path print a deprecation warning?
- Are output files byte-for-byte compatible where expected?
- Are stdout/stderr conventions unchanged except for the warning?
- Are current docs updated without rewriting historical context?
- Are CI workflows and scripts checked for hardcoded old paths?
- Is the PR scoped to one family?