Skip to content

fix: complete realtime tool failures#3530

Open
he-yufeng wants to merge 2 commits into
openai:mainfrom
he-yufeng:fix/realtime-tool-failure-output-3356
Open

fix: complete realtime tool failures#3530
he-yufeng wants to merge 2 commits into
openai:mainfrom
he-yufeng:fix/realtime-tool-failure-output-3356

Conversation

@he-yufeng
Copy link
Copy Markdown

Summary

Fixes #3356.

Known realtime tool calls could fail before the SDK sent a tool output back to the model. In that case the app still received the local exception path, but the model-visible function call stayed incomplete.

This change keeps the local exception behavior and completes the model-facing call first:

  • sends a RealtimeModelSendToolOutput(start_response=True) when a known function tool raises, including timeout_behavior="raise_exception"
  • sends the same model-visible failure output when a realtime handoff raises or returns an invalid result
  • keeps the existing exception propagation / RealtimeError path for application observers
  • adds regression coverage for timeout failures, ordinary tool exceptions, and handoff failures

To verify

  • uv run pytest tests/realtime/test_session.py -q
  • uv run ruff check src/agents/realtime/session.py tests/realtime/test_session.py
  • uv run mypy src/agents/realtime/session.py tests/realtime/test_session.py
  • git diff --check

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 03a3dde34c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +837 to +844
if await self._send_function_tool_failure_output(
event,
tool=func_tool,
tool_context=tool_context,
agent=agent,
error=exc,
):
mark_completed = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the original tool failure if output sending fails

When the model send for this synthesized failure output fails, _send_tool_output_completion raises _PendingToolOutputSendError from inside the except block, replacing the original tool exception. In the async tool-call path _on_tool_call_task_done treats that wrapper as a retryable send failure and returns without storing or surfacing the original tool error; when the same call id is retried it only flushes the cached output and the app never gets the expected RealtimeError/exception for the failed tool. This affects transient send_event failures while reporting a tool exception, so the new failure-output path does not always keep the existing local exception behavior.

Useful? React with 👍 / 👎.

@seratch seratch added duplicate This issue or pull request already exists feature:realtime labels May 30, 2026
@he-yufeng he-yufeng force-pushed the fix/realtime-tool-failure-output-3356 branch from 03a3dde to e1e6dd4 Compare May 30, 2026 22:25
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1e6dd44f3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +881 to 885
await self._send_handoff_failure_output(
event,
tool_context=tool_context,
error=exc,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve handoff failures when output sending fails

For handoff failures, if _send_tool_output_completion raises here (for example, a transient send_event failure while sending this synthesized error output), this await replaces the original handoff exception with _PendingToolOutputSendError; in the async task path _on_tool_call_task_done treats that wrapper as retryable and returns without storing or emitting the handoff failure, and a later duplicate call just flushes the cached output. That means applications can miss the handoff exception even though the model eventually receives the failure output.

Useful? React with 👍 / 👎.

@he-yufeng
Copy link
Copy Markdown
Author

Updated this in 026f75f.

The retry path now keeps the cached model-visible failure output, but _PendingToolOutputSendError also carries the original tool/handoff exception. The async task callback still reports the retryable send failure, and also stores/emits the original failure so applications do not lose the local exception signal.

Verified locally:

  • PYTHONPATH=src python -m pytest tests\realtime\test_session.py::TestToolCallExecution::test_async_function_tool_failure_preserved_when_error_output_send_fails tests\realtime\test_session.py::TestToolCallExecution::test_handoff_failure_preserved_when_error_output_send_fails tests\realtime\test_session.py::TestToolCallExecution::test_function_tool_exception_sends_model_visible_output tests\realtime\test_session.py::TestToolCallExecution::test_handoff_tool_exception_sends_model_visible_output tests\realtime\test_session.py::TestToolCallExecution::test_async_function_tool_send_failure_retries_cached_output_without_rerun -q
  • python -m py_compile src\agents\realtime\session.py tests\realtime\test_session.py
  • python -m ruff check src\agents\realtime\session.py tests\realtime\test_session.py
  • python -m ruff format --check src\agents\realtime\session.py tests\realtime\test_session.py
  • git diff --check

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 026f75f0cf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +761 to +763
except _PendingToolOutputSendError as send_error:
send_error.original_error = error
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve original errors on synchronous send failures

When async_tool_calls is disabled (or approval calls await _handle_tool_call directly), a transient send_event failure while reporting a tool exception makes this helper re-raise _PendingToolOutputSendError; since _on_tool_call_task_done is not involved, original_error is never consumed and the app sees send failed instead of the original tool error. Fresh evidence beyond the existing async comments: original_error is only read in _on_tool_call_task_done, which the synchronous path bypasses.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duplicate This issue or pull request already exists feature:realtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Realtime known tool failures (exception/timeout) do not send model-visible output

2 participants