-
Notifications
You must be signed in to change notification settings - Fork 3.5k
OpenAI Codex Responses replay can emit oversized call_id from resumed cross-provider history #2328
Description
Summary
I now have a concrete unit-test repro for the oversized input[*].call_id failure.
This is not a same-model replay case.
It is a cross-provider / cross-api resumed-history replay case:
- replay target:
provider=openai-chatgpt,api=openai-codex-responses,model=gpt-5.4 - historical resumed assistant/toolResult messages:
provider=screenpipe,api=openai-completions,model=gemini-3-pro
In this scenario, convertResponsesMessages(...) emits a function_call.call_id longer than 64 chars from real persisted session history, which matches the production OpenAI error:
Invalid 'input[...].call_id': string too long. Expected a string with maximum length 64
Minimal failing unit repro
Gist:
The gist contains:
- a single failing
vitestunit test - the exact command to run
- the failing assertion
Failing command
cd packages/ai && npx vitest --run test/openai-chatgpt-resume-call-id-repro.test.tsFailing result
AssertionError: expected 930 to be less than or equal to 64
What the test proves
Given:
- a real historical tool call id captured from a resumed Screenpipe/pi session
- replay target model
openai-chatgpt/openai-codex-responses/gpt-5.4
convertResponsesMessages(...) produces a function_call / function_call_output payload whose call_id exceeds the OpenAI Responses limit.
So the bug is reproducible locally at the message-conversion layer without needing Screenpipe runtime or a live provider call.
Important nuance
I am not claiming the fix is necessarily the normalization change I proposed earlier.
This issue is only to provide the concrete runnable repro you asked for.
If useful, I can follow up with a PR based on whichever fix direction you think is correct once alignment on expected behavior is clear.