PLAN — P22 OpenAI Agents JS Tool Approval Interruption / Resumable-State Evidence Interop (2026 Q2)¶
- Date: 2026-04-16
- Owner: Evidence / Product
- Status: Planning lane with one runtime-backed local probe
- Scope (current repo state): Define one bounded OpenAI Agents JS-adjacent lane centered on approval interruptions plus one resumable continuation anchor derived from the same paused run. This plan does not propose broad OpenAI Agents JS support, transcript import, session truth, server-managed continuation truth, or full
RunStateimport.
1. Why P22 should exist¶
openai/openai-agents-js is now a real, active upstream with a clear public shape around:
- agent runs
- result surfaces
- human-in-the-loop approvals
- sessions
- tracing
That matters because Assay does not need the whole SDK.
It needs the smallest honest external-consumer seam that:
- already exists in named public docs,
- is small enough to review without inheriting SDK runtime truth,
- and has real live design pressure in the public repo.
The strongest candidate seam is:
- one paused
RunResult - one bounded
interruptionslist - one resumable continuation anchor derived from the same
RunState
That lane is smaller than:
- full transcript
history - full
newItems - full
runContext - provider-managed
lastResponseIdchaining - session lifecycle truth
- full serialized
RunState - raw model response or tracing truth
2. Why this is timely¶
This seam is already explicit in the public docs.
The Results guide says the right surface for:
- pending approvals,
- and a resumable snapshot,
is interruptions plus state.
The Human-in-the-loop guide then makes the behavior concrete:
- a tool call that needs approval pauses the run,
- the SDK returns
interruptions, - the caller resolves them on
result.state, - and the same paused run resumes from the same
RunState.
This exact area is also alive upstream in public issues:
#1097shows that approval-plus-resume behavior still raises compatibility questions across manual history, provider-managed continuation, and session-backed resume paths#1104shows that rejection signaling is still a living boundary question rather than a closed, stable truth surface
That is exactly the kind of upstream posture Assay can work with:
- the seam is named
- the docs are strong enough to start from
- the behavior is useful
- but the broader continuation semantics are still alive enough that we should stay small
3. Hard positioning rule¶
This lane must stay smaller than the upstream ecosystem name.
Normative framing:
P22v1 claims only bounded approval-interruption evidence plus one resumable continuation anchor derived from a paused run. It does not claim transcript truth, session truth, server-managed continuation truth, or completeRunStatetruth.
That means:
- OpenAI Agents JS remains the runtime, not Assay truth
- a paused approval run is an observed upstream state, not the truth of the whole conversation lifecycle
- Assay stays an external evidence consumer, not an authority on persistence strategy, replay completeness, or approval outcome semantics
Common anti-overclaim sentence:
We are not asking Assay to inherit OpenAI Agents JS transcript, session, provider-chaining, or
RunStatesemantics as truth.
4. Why this seam and not sessions/results in general¶
The public docs expose many result surfaces:
finalOutputhistoryoutputnewItemslastAgentlastResponseIdinterruptionsstaterunContext
That is useful, but it is also exactly why the first wedge should be smaller than "results interop."
The first honest wedge is:
- the run paused on approval,
- these are the pending approval items,
- and this paused run has one resumable continuation anchor.
That is more reviewable than:
- the whole conversation transcript,
- the whole rich run-item stream,
- or the whole serialized run-state blob.
5. Why not broader continuation surfaces¶
5.1 Why not history¶
history is explicitly the replay-ready next-turn input with the full local transcript.
That makes it useful product-wise and wrong as the first Assay wedge.
Why:
- it is already transcript truth territory
- it widens into manual chat loop semantics
- the docs explicitly warn that mixing client-managed history with server-managed state can duplicate context
That is too broad.
5.2 Why not session¶
Sessions are important, but they are not the first seam.
The docs describe them as:
- fetching stored history before a turn,
- persisting new items after each run,
- and remaining available for future turns and interrupted resumes
That is lifecycle and memory-management truth, not a small evidence seam.
5.3 Why not previousResponseId / lastResponseId¶
Provider-managed continuation is real, but it is also provider-specific and already adjacent to compatibility questions.
The docs say lastResponseId is the value to pass as previousResponseId when you are using OpenAI Responses API chaining.
That is useful and still too broad for v1 because it opens immediately into:
- persistence strategy choice
- provider-managed state semantics
- mode-mixing questions with
history,session, andconversationId
Issue #1097 makes that exact compatibility pressure visible.
5.4 Why not full newItems¶
newItems is a strong surface and the docs explicitly position it as the rich run-item view when agent/tool/handoff metadata matters.
But it is still broader than the first seam we want because it also carries:
- message items
- tool outputs
- handoff boundaries
- other run-item metadata
The first wedge should stay on:
- the approval interruption itself,
- not the whole rich run delta.
5.5 Why not full serialized RunState¶
The docs explicitly say state is the serializable snapshot behind the result, and that you can serialize it and resume later.
That is useful and too rich to import wholesale into the first evidence contract.
For Assay, the first lane should use:
- one bounded anchor derived from serialized state,
not:
- the whole serialized state blob as canonical evidence.
6. Upstream caveats we must preserve¶
The seam is good, but it is not closed.
Three cautions matter immediately.
6.1 Approval interruptions are run-wide¶
The docs are explicit that approval interruptions surface on the outer run, including:
- direct tool approvals,
- handoff-reached tools,
- and nested
agent.asTool()approvals
So P22 must not pretend the seam is only about one local tool call in one top-level agent.
The lane should still stay on the paused outer run artifact.
6.2 Rejection outcome semantics are still alive upstream¶
Issue #1104 is the strongest public signal here.
The important lesson is not "rejections are broken forever."
The important lesson is:
- the structural signaling for rejected tool calls is still moving,
- and public discussion there makes it clear that apps should not treat the current
statusfield as a settled rejection-truth surface.
So P22 must not make rejection result semantics a required part of the first sample contract.
6.3 Continuation-mode compatibility is still being investigated¶
Issue #1097 is the strongest public signal here.
The public issue discussion makes it clear that the unresolved compatibility boundary still spans:
- manual history
previousResponseIdconversationId- session-backed resumes
So P22 must stay smaller than "OpenAI Agents JS continuation support."
7. Recommended P22 seam¶
The correct lane name is intentionally small:
OpenAI Agents JS Tool Approval Interruption / Resumable-State Artifact Lane
The recommended v1 seam is:
- one paused approval run envelope
- one bounded
interruptionslist - one fixed pause reason tied to tool approval
- one resumable continuation anchor derived from the same paused
RunState
This is not:
- session support
- transcript support
- full
newItemssupport - general
RunResultsupport - provider-chaining support
- full
RunStateimport
Important framing rule:
The first sample should use a frozen artifact derived from one paused approval run, not a claim that Assay models OpenAI Agents JS result or continuation semantics as a whole.
8. Recommended v1 artifact contract¶
Use one frozen serialized artifact derived from the approval-interruption lane.
The first artifact should stay small and self-describing:
schemaframeworksurfacepause_reasoninterruptionsresume_state_reftimestamp
Optional:
active_agent_reflast_agent_refmetadata_ref
Important framing rule:
The sample uses one frozen artifact derived from
RunResult.interruptionsand one resumable continuation anchor derived fromRunState. It is not a claim that OpenAI Agents JS publishes one stable external evidence wrapper for approval or resume consumers.
8.1 Field meaning¶
pause_reason¶
This is required.
In v1 the only allowed value should be:
tool_approval
Why this is strict:
- it keeps the lane tied to the named public human-in-the-loop seam
- it prevents drift into generic pauses, retries, or stream aborts
- it keeps
P22about approval interruptions instead of every resumable case
interruptions¶
This is required.
It should stay:
- bounded
- ordered
- reviewable
- specific to pending tool approvals
It must not become:
- a full
newItemsdump - raw
runContext - transcript history
- a whole approval UI state model
Each v1 interruption should stay on the smallest honest subset:
tool_namecall_id_ref- optional
agent_ref
Nothing else should enter the first sample unless one real implementation forces it.
v1 does not import:
- tool arguments
- rejection text
- raw interruption payloads
into the canonical artifact.
resume_state_ref¶
This is required.
It is an Assay-side bounded anchor derived from serialized RunState, not evidence that OpenAI Agents JS publishes one native resumeStateRef field.
It must remain:
- opaque
- portable
- non-resolving
It must not become:
- a raw serialized
RunStatedump in the canonical contract - a dashboard link
- a provider-specific continuation promise
It must also not be read as a claim that the underlying serialized RunState object is:
- small
- stable
- protocol-agnostic
- or itself part of the evidence contract
active_agent_ref / last_agent_ref¶
These are optional.
They are only useful if one real run proves they are naturally present and help reviewability.
They must not become:
- a demand to encode full multi-agent control flow
- a claim about which agent will always own the next turn
9. Assay-side meaning¶
The P22 sample may only claim bounded approval-interruption evidence.
Assay must not treat as truth:
- transcript truth
- session truth
- provider-managed continuation truth
- full serialized
RunStatetruth - rejection-outcome truth
The artifact only says:
- this run paused for tool approval
- these pending approval items were surfaced
- one resumable continuation anchor existed for the same paused run
10. Discovery gate before implementation¶
Do not build a sample from docs alone.
Before closing this lane, do one bounded discovery pass:
- create one tiny OpenAI Agents JS harness
- define one tool that requires approval
- trigger exactly one paused approval run
- capture the smallest honest
interruptionsshape - serialize the paused
state - derive one bounded
resume_state_ref - approve or reject through the same
state - resume from the same paused run
Discovery is only done when we have:
- one real paused approval run
- one real
interruptionspayload - one real serialized-state round-trip
- one explicit note about which fields were naturally present
- one explicit note about what we still refused to import
10.1 Current discovery seam¶
This lane is no longer docs-only.
One small runtime-backed local probe has now been run against the public @openai/agents package using:
- one top-level agent
- one local function tool with
needsApproval: true - one fake local model that emitted one tool call
- one serialized-state round-trip through
RunState.toString()andRunState.fromString(...)
What was observed in that probe:
- the first run paused and returned one real interruption item
- the interruption object naturally exposed:
toolNameagentrawItem- the call id was visible under
rawItem.callId, not as a first-class top-level interruption property - the serialized paused state length was
3782 - the serialized paused state SHA-256 was
a136d3d331cff5810ec27c7afc5fed9b0e16ed8608e5e698358eedbffb83fd51 - resuming from the same serialized state after
approve(...)produced a final output and zero remaining interruptions
That runtime result strengthens the lane in two ways:
resume_state_refcan now be grounded in one real serialized-state anchorcall_id_refis now more honestly framed as an Assay-side bounded reduction over live interruption data, because the current interruption object does not surface a top-levelcallId
Important honesty line:
- the paused-run runtime path is real
- provider-backed continuation behavior is still not proven by this first probe
10.2 Exit criterion for P22¶
P22 is not closed just because a plan exists.
This lane is only complete when all of the following are true:
- one real approval interruption has been captured from a runnable OpenAI Agents JS setup
- the current required vs optional field split has been checked against that run
resume_state_refhas been derived from a real serializedRunState- the paused run has actually resumed from the same serialized state
- the sample still stays smaller than transcript, session, and provider-mode continuation surfaces
- the lane still does not widen into full
newItems, fullrunContext, or fullRunState
Until then, P22 should be described as:
- docs-backed
- issue-backed
- boundary-tight
- pre-proof on the live approval-interruption lane
11. Minimal runtime target¶
Use the smallest harness that can deterministically produce:
- one paused tool-approval interruption
- one serialized resumable state
Preferred first target:
- one top-level agent
- one simple function tool with
needsApproval - one non-realtime run
- no session
- no handoff
- no provider-mode mixing
Hard constraints:
- no transcript-history lane as the center of the sample
- no session integration in the first proof
- no
previousResponseIdchaining in the first proof - no tracing or raw response export
- no full
newItemscapture as canonical contract
12. Concrete repo deliverable¶
If this lane is accepted, the first implementation PR should add:
examples/openai-agents-js-approval-interruption-evidence/README.mdexamples/openai-agents-js-approval-interruption-evidence/map_to_assay.pyexamples/openai-agents-js-approval-interruption-evidence/fixtures/valid.openai-agents-js.jsonexamples/openai-agents-js-approval-interruption-evidence/fixtures/failure.openai-agents-js.jsonexamples/openai-agents-js-approval-interruption-evidence/fixtures/malformed.openai-agents-js.jsonexamples/openai-agents-js-approval-interruption-evidence/fixtures/valid.assay.ndjsonexamples/openai-agents-js-approval-interruption-evidence/fixtures/failure.assay.ndjson
13. Valid / failure / malformed corpus¶
13.1 Valid¶
One artifact with:
pause_reason = tool_approval- one bounded
interruptionslist - one bounded
resume_state_ref
13.2 Failure¶
One weaker but still valid artifact with:
pause_reason = tool_approval- one bounded
interruptionslist - one bounded
resume_state_ref - fewer optional reviewer aids present
This should still be a valid paused approval artifact.
It must not imply:
- a stable rejection result shape
- a stable approval outcome shape
- a native confidence or ranking model
v1 failure fixtures remain paused approval artifacts with fewer reviewer aids; they do not claim stable approve/reject outcome semantics.
The lane is about pending interruption evidence first, not about resolved tool call outcome truth.
13.3 Malformed¶
One malformed artifact that fails fast, for example:
- missing
interruptions - empty
interruptions pause_reasonnot equal totool_approval- raw serialized
RunStateinlined as canonical evidence - full
historyadded to the artifact - full
newItemsadded to the artifact lastResponseId/ session / history surfaces mixed in as if they defined the laneresume_state_refgiven as a URL- interruption items missing
tool_nameorcall_id_ref
For v1, these cardinality and drift violations should be treated as malformed rather than partially imported:
history,session, andlastResponseId/previousResponseId-style continuation hints mixed together as co-equal lane-defining surfaces- full transcript history
- full rich run-item arrays
- raw run-context dumps
- raw serialized
RunStateinline as canonical evidence
14. Outward strategy¶
The outward move should be issue-first, not show-and-tell-first.
Why:
- the repo uses issues, not GitHub Discussions, as the public seam-pressure channel
- the approval/resume boundary is already alive there
- the right outward move is likely one small sample-backed boundary question, not a broad feature pitch
The likely outward question should stay small and warm:
- we built a tiny external-consumer sample around one paused approval run
- we kept it on
interruptionsplus one resumable continuation anchor - is that roughly the right minimal surface for an external evidence consumer
- or is there a thinner official seam you would rather point us at
Do not ask about:
- full session support
- transcript support
- provider-chaining support
- broad
RunStateexport
15. Source anchors¶
Public sources used for this lane:
- OpenAI Agents JS README
- Results guide
- Human-in-the-loop guide
- Running agents guide
- Sessions guide
- GitHub issues
#1097and#1104
16. Final judgment¶
P22 is a strong candidate because it is:
- current
- small
- docs-backed
- issue-backed
- and much less likely to drift into transcript or persistence-truth inflation than a broader sessions/results lane
The core discipline is simple:
Keep
P22about one paused approval run, one boundedinterruptionslist, and one resumable continuation anchor, not about OpenAI Agents JS as a whole.