Skip to content

Assay-Runner Gemini Fixture Design

Internal Phase 2B fixture-design note for the selected second-runtime candidate. This page is design-only. It does not approve fixture code, add runtime dependencies, introduce a cassette implementation, modify workflow triggers, or change v0 artifact contracts.

This note records the design discipline the first Gemini fixture implementation PR must satisfy before code is added. The candidate was selected in second-runtime-candidate-selection.md and approved by issue #1295 via PR #1305.

Status

  • Selected runtime: Gemini Python google-genai direct
  • Model pin: gemini-3.5-flash (stable/GA per ai.google.dev/gemini-api/docs/models)
  • Selection ≠ fixture approval. This design note codifies the constraints the future fixture implementation must meet. It does not itself authorize implementation.
  • Delegated gate for first fixture PR: gates=all, per second-runtime-plan.md § Suggested PR Sequence step 4. A narrower Gemini-specific gate is later coordinated work.

Goal

Produce a deterministic offline Gemini fixture that exercises the same small read-file capability class as the S5 OpenAI Agents fixture, with sufficient stability for three-run delegated determinism over the v0 normalized runner artifacts. The fixture is a runtime measurement target, not a Gemini showcase.

Fixture Command Shape

The fixture must conform to the v0 acceptance fixture invocation contract in fixtures-v0.md § Invocation Contract:

<fixture-script> <work-dir>

Constraints:

  • requires exactly one work-directory argument
  • writes deterministic fixture files below the provided work directory
  • avoids wall-clock timestamps, random suffixes, hostnames, absolute temp paths, or dependency-version strings in evidence-bearing outputs
  • keeps temporary control files (cassette path, request/response JSONL) outside the measured work directory
  • does not move itself between cgroups or spawn detached processes outside the measured process tree

Suggested location for the implementation: runner-fixtures/gemini-google-genai/sdk-policy-agent.sh plus a Python script it invokes (final location after the Phase 2D Slice 5A fixture package boundary; the original draft pointed to tests/fixtures/runner-spike/gemini-google-genai-agent.sh). Exact file names are an implementation-PR concern; the shape above is the contract.

Cassette Strategy

The fixture must satisfy Row 1 (Offline execution) of the candidate evaluation via checked-in cassette replay. No live model calls during delegated acceptance.

Recording (curation step, maintainer-only)

  • maintainer obtains a Gemini API key out-of-band (not stored in the fixture)
  • maintainer runs the fixture once in record mode against generativelanguage.googleapis.com using gemini-3.5-flash
  • one non-streaming client.models.generate_content() call is made; the response (functionCall part) is recorded into a deterministic cassette
  • the API key is never written to disk, environment files, or commit history; the recording session is the only point where it touches the system

Replay (delegated acceptance)

  • delegated runner invokes the fixture with record_mode='none' (VCR.py semantics) or equivalent
  • zero network calls; cassette is the sole data source
  • replay is byte-deterministic across runs

Checked-in cassette

  • cassette is committed under the fixture directory, alongside the script it serves
  • cassette is human-reviewable plain text (YAML for VCR.py); not binary
  • response body in cassette contains the exact functionCall.id and payload the model produced at recording time

Re-recording discipline (maintainer-controlled)

  • re-recording happens only when google-genai is bumped, when the Gemini API contract changes, or when the model pin moves
  • re-recording is not a normal acceptance behavior; it is a curation event documented in the PR that bumps the dependency or pin
  • analogous to the @openai/agents bump flow in fixtures-v0.md

Dependency Lock Path

The fixture implementation PR must use one of the following lock paths, chosen for reproducibility:

Path Lock file When to choose
pip + requirements.txt with --hash pins requirements.txt minimal additional tooling on the delegated runner
uv with uv.lock uv.lock if uv is already on the delegated runner
poetry with poetry.lock poetry.lock if poetry is already on the delegated runner

The fixture directory carries its own lock; no workspace-wide Python dependency is introduced unless the implementation PR explicitly proposes it (in which case the workspace-dependency-bump path in the CI lane contract applies).

google-genai is the only required runtime dependency for the fixture implementation itself; transitive dependencies must remain bounded by the SDK's published constraints.

Expected SDK Event Shape

The fixture must emit normalized SDK events conforming to assay.runner.sdk_event.v0 and to the SDK Fixture Contract in fixtures-v0.md.

Mapping from Gemini's function-calling flow to v0 SDK events:

v0 SDK event Gemini flow trigger
tool_call_started assistant message contains a functionCall part — fixture observes the call begin
tool_call_completed fixture has produced a functionResponse and dispatched it back to the model
run_finished model returns a non-function-call assistant message and the run ends

Constraints inherited from the v0 SDK Fixture Contract:

  • stable schema string assay.runner.sdk_event.v0
  • shared run_id
  • contiguous seq values starting at zero
  • stable source (suggested: gemini-google-genai-fixture or similar stable identifier; exact string is fixture-instance scope, not contract)
  • installed SDK package name and version loaded from google-genai package metadata
  • stable tool name (suggested: read_file to match S5's capability class)
  • stable tool_call_id on tool-call events, mapped from FunctionCall.id — this is the identity that makes the candidate qualify for level-3 stable identity

The tool_call_id MUST be the value emitted by Gemini in FunctionCall.id during the recorded cassette interaction. The fixture must not synthesize a tool_call_id; if FunctionCall.id is absent in a response, the fixture must fail loudly rather than fall back to a generated value.

Expected Policy Event Shape

The fixture should integrate with the existing policy capture path so the delegated acceptance covers kernel + policy + SDK correlation, parallel to the S5 OpenAI Agents fixture.

Constraints inherited from the Policy Fixture Contract:

  • call the intended MCP tool (read_file)
  • deterministic JSON-RPC ids
  • pass a stable _meta.tool_call_id for SDK-to-policy correlation
  • the policy tool_call_id MUST equal the SDK tool_call_id (which equals FunctionCall.id from the cassette)
  • write policy decisions to ASSAY_RUNNER_POLICY_DECISION_LOG

For parity with the Accepted Full S5 Fixture Instance, the Gemini fixture should also preserve these stricter invariants (these are S5 accepted-instance values, not general contract rules):

  • exactly one normalized policy event in the captured stream
  • per-event decision = allow (coarse policy outcome)
  • the corresponding policy_decisions summary in capability-surface.json is allow:read_file (the namespaced summary string is the surface representation, distinct from the per-event coarse outcome above)

The implementation PR may share the existing MCP file server used by the S5 fixture, or provide a Gemini-specific wrapper if scope reasons require. That is a design decision for the implementation PR; both shapes satisfy this design note.

Expected Normalized Artifacts

After three-run determinism comparison, the fixture must produce the same v0 artifact family as S5:

  • observation-health.json
  • capability-surface.json
  • correlation-report.json
  • layers/sdk.ndjson
  • layers/policy.ndjson

observation-health.json expected shape

  • schema = assay.runner.observation_health.v0
  • platform = linux
  • kernel_layer = complete
  • ringbuf_drops = 0
  • policy_layer = present
  • sdk_layer = self_reported
  • cgroup_correlation = clean
  • notes include s5_sdk_capture: sdk_events=3 sdk_tool_calls=1

capability-surface.json expected shape

  • a bounded, deterministic set of normalized filesystem paths under the work directory:
  • the read_file target the Gemini function call is replayed against
  • if the shared S5 policy-agent / MCP file-server wrapper is reused (per the option above), one additional deterministic wrapper input path (parallel to S5's policy-input.txt companion file)
  • exact path count is determined by the wrapper choice and frozen by the cassette + fixture script in the implementation PR
  • mcp_tools contains read_file
  • policy_decisions contains allow:read_file

correlation-report.json expected shape

  • status = clean
  • ambiguities = []
  • exactly one binding, where tool_call_id equals the cassette's recorded FunctionCall.id
  • policy_decision = allow
  • window = {"start": "run_started", "end": "run_finished"}

Three-run determinism

Three sequential runs of the fixture, via a delegated wrapper script analogous to scripts/ci/runner-spike-openai-agents-kernel-policy-three-run-determinism.sh, must produce byte-identical artifacts in the five files listed above.

Expected Delegated Gate

gates=all per second-runtime-plan.md § Suggested PR Sequence step 4. A narrower Gemini-specific gate (e.g. gemini-kernel-policy) is later coordinated work that requires updates to ci-lanes.md, the lane-check classifier, the workflow inputs.gates enum, and the matching acceptance scripts. Not a side effect of the first fixture PR.

Kill Criteria (Before Code)

Stop the implementation line before writing fixture code if any of these become true during PR design or implementation:

  • Gemini's actual gemini-3.5-flash function-calling response does not contain a FunctionCall.id for the recorded call — the level-3 qualifies outcome rests on this guarantee
  • the google-genai SDK silently substitutes a missing id (contradicting the typed source Field(default=None) evidence used in the candidate evaluation)
  • cassette determinism cannot be achieved without per-request header scrubbing that would also mask the recorded FunctionCall.id
  • gemini-3.5-flash is moved to deprecated, removed, or its function-call identity guarantee is rescinded by Google between selection and implementation
  • dependency installation (google-genai and transitive packages) is not byte-deterministic enough for three-run normalized-artifact stability
  • the fixture cannot satisfy the v0 artifact-shape expectations without weakening the runner normalizer's evidence-versus-telemetry filters

If any kill criterion fires, the implementation PR must stop and either (a) document the regression in a follow-up evaluation PR that updates the Gemini candidate outcome in second-runtime-candidate-selection.md, or (b) open a separate decision PR for the relevant follow-up issue.

Non-Goals

This design does not:

  • approve fixture implementation code
  • add runtime dependencies in the docs tree
  • add cassette content or cassette-format choice as a contract decision
  • modify the v0 artifact contracts, fixture v0 contract, CI lane contract, or boundary map
  • introduce a narrower delegated gate
  • propose cross-runtime capability-diff against S5 (Phase 2C per second-runtime-plan.md § Out Of Phase 2B Scope)
  • broaden the runner normalizer's evidence taxonomy
  • pre-approve later Gemini model bumps or family expansion beyond gemini-3.5-flash
  • close any candidate-selection re-evaluation by removing prior insufficient evidence entries from the selection note

Implementation PR Acceptance Checklist

The implementation PR must independently satisfy:

References