Assay-Runner Gemini Fixture Design¶
Internal Phase 2B fixture-design note for the selected second-runtime candidate. This page is design-only. It does not approve fixture code, add runtime dependencies, introduce a cassette implementation, modify workflow triggers, or change v0 artifact contracts.
This note records the design discipline the first Gemini fixture implementation PR must satisfy before code is added. The candidate was selected in second-runtime-candidate-selection.md and approved by issue #1295 via PR #1305.
Status¶
- Selected runtime: Gemini Python
google-genaidirect - Model pin:
gemini-3.5-flash(stable/GA per ai.google.dev/gemini-api/docs/models) - Selection ≠ fixture approval. This design note codifies the constraints the future fixture implementation must meet. It does not itself authorize implementation.
- Delegated gate for first fixture PR:
gates=all, persecond-runtime-plan.md§ Suggested PR Sequence step 4. A narrower Gemini-specific gate is later coordinated work.
Goal¶
Produce a deterministic offline Gemini fixture that exercises the same small read-file capability class as the S5 OpenAI Agents fixture, with sufficient stability for three-run delegated determinism over the v0 normalized runner artifacts. The fixture is a runtime measurement target, not a Gemini showcase.
Fixture Command Shape¶
The fixture must conform to the v0 acceptance fixture invocation contract in fixtures-v0.md § Invocation Contract:
Constraints:
- requires exactly one work-directory argument
- writes deterministic fixture files below the provided work directory
- avoids wall-clock timestamps, random suffixes, hostnames, absolute temp paths, or dependency-version strings in evidence-bearing outputs
- keeps temporary control files (cassette path, request/response JSONL) outside the measured work directory
- does not move itself between cgroups or spawn detached processes outside the measured process tree
Suggested location for the implementation: runner-fixtures/gemini-google-genai/sdk-policy-agent.sh plus a Python script it invokes (final location after the Phase 2D Slice 5A fixture package boundary; the original draft pointed to tests/fixtures/runner-spike/gemini-google-genai-agent.sh). Exact file names are an implementation-PR concern; the shape above is the contract.
Cassette Strategy¶
The fixture must satisfy Row 1 (Offline execution) of the candidate evaluation via checked-in cassette replay. No live model calls during delegated acceptance.
Recording (curation step, maintainer-only)¶
- maintainer obtains a Gemini API key out-of-band (not stored in the fixture)
- maintainer runs the fixture once in record mode against
generativelanguage.googleapis.comusinggemini-3.5-flash - one non-streaming
client.models.generate_content()call is made; the response (functionCallpart) is recorded into a deterministic cassette - the API key is never written to disk, environment files, or commit history; the recording session is the only point where it touches the system
Replay (delegated acceptance)¶
- delegated runner invokes the fixture with
record_mode='none'(VCR.py semantics) or equivalent - zero network calls; cassette is the sole data source
- replay is byte-deterministic across runs
Checked-in cassette¶
- cassette is committed under the fixture directory, alongside the script it serves
- cassette is human-reviewable plain text (YAML for VCR.py); not binary
- response body in cassette contains the exact
functionCall.idand payload the model produced at recording time
Re-recording discipline (maintainer-controlled)¶
- re-recording happens only when
google-genaiis bumped, when the Gemini API contract changes, or when the model pin moves - re-recording is not a normal acceptance behavior; it is a curation event documented in the PR that bumps the dependency or pin
- analogous to the
@openai/agentsbump flow infixtures-v0.md
Dependency Lock Path¶
The fixture implementation PR must use one of the following lock paths, chosen for reproducibility:
| Path | Lock file | When to choose |
|---|---|---|
pip + requirements.txt with --hash pins | requirements.txt | minimal additional tooling on the delegated runner |
uv with uv.lock | uv.lock | if uv is already on the delegated runner |
poetry with poetry.lock | poetry.lock | if poetry is already on the delegated runner |
The fixture directory carries its own lock; no workspace-wide Python dependency is introduced unless the implementation PR explicitly proposes it (in which case the workspace-dependency-bump path in the CI lane contract applies).
google-genai is the only required runtime dependency for the fixture implementation itself; transitive dependencies must remain bounded by the SDK's published constraints.
Expected SDK Event Shape¶
The fixture must emit normalized SDK events conforming to assay.runner.sdk_event.v0 and to the SDK Fixture Contract in fixtures-v0.md.
Mapping from Gemini's function-calling flow to v0 SDK events:
| v0 SDK event | Gemini flow trigger |
|---|---|
tool_call_started | assistant message contains a functionCall part — fixture observes the call begin |
tool_call_completed | fixture has produced a functionResponse and dispatched it back to the model |
run_finished | model returns a non-function-call assistant message and the run ends |
Constraints inherited from the v0 SDK Fixture Contract:
- stable schema string
assay.runner.sdk_event.v0 - shared
run_id - contiguous
seqvalues starting at zero - stable
source(suggested:gemini-google-genai-fixtureor similar stable identifier; exact string is fixture-instance scope, not contract) - installed SDK package name and version loaded from
google-genaipackage metadata - stable tool name (suggested:
read_fileto match S5's capability class) - stable
tool_call_idon tool-call events, mapped fromFunctionCall.id— this is the identity that makes the candidate qualify for level-3 stable identity
The tool_call_id MUST be the value emitted by Gemini in FunctionCall.id during the recorded cassette interaction. The fixture must not synthesize a tool_call_id; if FunctionCall.id is absent in a response, the fixture must fail loudly rather than fall back to a generated value.
Expected Policy Event Shape¶
The fixture should integrate with the existing policy capture path so the delegated acceptance covers kernel + policy + SDK correlation, parallel to the S5 OpenAI Agents fixture.
Constraints inherited from the Policy Fixture Contract:
- call the intended MCP tool (
read_file) - deterministic JSON-RPC ids
- pass a stable
_meta.tool_call_idfor SDK-to-policy correlation - the policy
tool_call_idMUST equal the SDKtool_call_id(which equalsFunctionCall.idfrom the cassette) - write policy decisions to
ASSAY_RUNNER_POLICY_DECISION_LOG
For parity with the Accepted Full S5 Fixture Instance, the Gemini fixture should also preserve these stricter invariants (these are S5 accepted-instance values, not general contract rules):
- exactly one normalized policy event in the captured stream
- per-event
decision = allow(coarse policy outcome) - the corresponding
policy_decisionssummary incapability-surface.jsonisallow:read_file(the namespaced summary string is the surface representation, distinct from the per-event coarse outcome above)
The implementation PR may share the existing MCP file server used by the S5 fixture, or provide a Gemini-specific wrapper if scope reasons require. That is a design decision for the implementation PR; both shapes satisfy this design note.
Expected Normalized Artifacts¶
After three-run determinism comparison, the fixture must produce the same v0 artifact family as S5:
observation-health.jsoncapability-surface.jsoncorrelation-report.jsonlayers/sdk.ndjsonlayers/policy.ndjson
observation-health.json expected shape¶
schema = assay.runner.observation_health.v0platform = linuxkernel_layer = completeringbuf_drops = 0policy_layer = presentsdk_layer = self_reportedcgroup_correlation = clean- notes include
s5_sdk_capture: sdk_events=3 sdk_tool_calls=1
capability-surface.json expected shape¶
- a bounded, deterministic set of normalized filesystem paths under the work directory:
- the
read_filetarget the Gemini function call is replayed against - if the shared S5 policy-agent / MCP file-server wrapper is reused (per the option above), one additional deterministic wrapper input path (parallel to S5's
policy-input.txtcompanion file) - exact path count is determined by the wrapper choice and frozen by the cassette + fixture script in the implementation PR
mcp_toolscontainsread_filepolicy_decisionscontainsallow:read_file
correlation-report.json expected shape¶
status = cleanambiguities = []- exactly one binding, where
tool_call_idequals the cassette's recordedFunctionCall.id policy_decision = allowwindow = {"start": "run_started", "end": "run_finished"}
Three-run determinism¶
Three sequential runs of the fixture, via a delegated wrapper script analogous to scripts/ci/runner-spike-openai-agents-kernel-policy-three-run-determinism.sh, must produce byte-identical artifacts in the five files listed above.
Expected Delegated Gate¶
gates=all per second-runtime-plan.md § Suggested PR Sequence step 4. A narrower Gemini-specific gate (e.g. gemini-kernel-policy) is later coordinated work that requires updates to ci-lanes.md, the lane-check classifier, the workflow inputs.gates enum, and the matching acceptance scripts. Not a side effect of the first fixture PR.
Kill Criteria (Before Code)¶
Stop the implementation line before writing fixture code if any of these become true during PR design or implementation:
- Gemini's actual
gemini-3.5-flashfunction-calling response does not contain aFunctionCall.idfor the recorded call — the level-3 qualifies outcome rests on this guarantee - the
google-genaiSDK silently substitutes a missing id (contradicting the typed sourceField(default=None)evidence used in the candidate evaluation) - cassette determinism cannot be achieved without per-request header scrubbing that would also mask the recorded
FunctionCall.id gemini-3.5-flashis moved to deprecated, removed, or its function-call identity guarantee is rescinded by Google between selection and implementation- dependency installation (
google-genaiand transitive packages) is not byte-deterministic enough for three-run normalized-artifact stability - the fixture cannot satisfy the v0 artifact-shape expectations without weakening the runner normalizer's evidence-versus-telemetry filters
If any kill criterion fires, the implementation PR must stop and either (a) document the regression in a follow-up evaluation PR that updates the Gemini candidate outcome in second-runtime-candidate-selection.md, or (b) open a separate decision PR for the relevant follow-up issue.
Non-Goals¶
This design does not:
- approve fixture implementation code
- add runtime dependencies in the docs tree
- add cassette content or cassette-format choice as a contract decision
- modify the v0 artifact contracts, fixture v0 contract, CI lane contract, or boundary map
- introduce a narrower delegated gate
- propose cross-runtime capability-diff against S5 (Phase 2C per
second-runtime-plan.md§ Out Of Phase 2B Scope) - broaden the runner normalizer's evidence taxonomy
- pre-approve later Gemini model bumps or family expansion beyond
gemini-3.5-flash - close any candidate-selection re-evaluation by removing prior
insufficient evidenceentries from the selection note
Implementation PR Acceptance Checklist¶
The implementation PR must independently satisfy:
-
second-runtime-plan.md§ Acceptance Criteria For The First Fixture PR -
fixtures-v0.md§ Adding Or Changing A Fixture -
ci-lanes.md— delegated proof recorded with run URL, head SHA, gate, and proof-pack artifact name - this design note's Cassette Strategy, Dependency Lock Path, SDK / Policy event shapes, Normalized Artifacts, and Kill Criteria
- the lane-check classifier correctly routes the PR to
gates=all
References¶
- Runner artifact v0 contracts
- Runner acceptance fixture v0 contract
- Runner CI lane contract
- Runner second runtime Phase 2B plan
- Runner second runtime candidate selection
- Runner capability-diff v0 contract
- Assay-Runner boundary and extraction map
- Phase 1 delegated proof pack
- Candidate selection PR: https://github.com/Rul1an/assay/pull/1305
- Selection issue (closed by #1305): https://github.com/Rul1an/assay/issues/1295