fix(llm): preserve fragmented agent tool calls by MatejKosec · Pull Request #10988 · ai-dynamo/dynamo

MatejKosec · 2026-06-26T14:14:11Z

Summary

Add deterministic full-HTTP replay coverage for the Anthropic Messages and OpenAI Responses compatibility endpoints, including fragmented and parallel tool calls and tool-result round trips.
Accumulate fragmented tool-call state until the backend emits an explicit tool_calls or legacy function_call finish reason, then publish completion events immediately; retain end-of-stream completion only for backends that omit the finish reason.
Preserve call identity, argument ordering, contiguous Anthropic block indices, and Responses output ordering when tool-call fragments arrive out of order.
Add gated streaming tests that withhold the usage tail and prove tool completion reaches the client first.

The replay fixtures are minimized reconstructions of recorded event ordering rather than backend-specific captures. The harness uses the production HTTP handlers and converters without a model, GPU, Docker, timing delay, or new dependency.

Validation

Run on computelab-build:

cargo fmt --all -- --check
cargo clippy -p dynamo-llm --tests -- -D warnings
cargo test -p dynamo-llm --lib stream_converter::tests — 29 passed
INSTA_UPDATE=no cargo test -p dynamo-llm --test anthropic_http_replay --test responses_http_replay — 13 passed

Reviewer guidance

The finish-reason chunk is the first explicit indication that all tool-call fragments are complete. Usage-only chunks may follow it, so tool completion is emitted on the finish reason while the usage-bearing terminal response event remains at end of stream.

This supersedes #8284 and #10985.

Summary by CodeRabbit

New Features
- Improved streaming handling for tool/function calls so partial arguments are delivered in the correct order, even when identifiers arrive late.
- Better support for parallel tool calls, with cleaner event sequencing and completion behavior.
Bug Fixes
- Tool and function call blocks now finish reliably at stream end, even when a response omits a terminal finish signal.
- Fixed duplicate or out-of-order completion events in streamed responses.
Documentation
- Added replay fixture notes and expanded test coverage for common streaming scenarios.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

github-actions · 2026-06-26T14:17:07Z

🌿 Fern Docs Preview: https://nvidia-preview-e8dadc99-810e-4fad-bf50-f742249ebd74.docs.buildwithfern.com/dynamo/dev

coderabbitai · 2026-06-26T14:22:53Z

Walkthrough

Adds buffered tool-call handling in Anthropic and OpenAI stream converters, and adds shared HTTP replay harnesses, scripted engine support, and replay tests/fixtures for Anthropic messages and OpenAI Responses compatibility.

Changes

Tool-call streaming and replay coverage

Layer / File(s)	Summary
Shared test harness and scripted engine `lib/llm/tests/common/scripted_chat_engine.rs`, `lib/llm/tests/common/http_harness.rs`	Adds a scripted chat engine with gated tails, HTTP server startup/shutdown helpers, SSE parsing, and JSON canonicalization for replay tests.
Anthropic buffered tool-call emission `lib/llm/src/protocols/anthropic/stream_converter.rs`	Buffers tool-call ids, names, and argument fragments, and flushes tool-use blocks on finish reasons or EOF.
Anthropic buffering tests and tagged helpers `lib/llm/src/protocols/anthropic/stream_converter.rs`	Mirrors the buffered flush path in tagged helpers and updates assertions for finish-driven flush, EOF fallback, text/tool ordering, fragmented arguments, incomplete identities, and parallel or duplicate tool calls.
OpenAI function-call buffering `lib/llm/src/protocols/openai/responses/stream_converter.rs`	Buffers function-call arguments until identity is complete, emits start and done events on finish reasons or EOF, and keeps completed output construction ordered.
OpenAI tests and parser helper `lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs`, `lib/llm/src/protocols/openai/responses/stream_converter.rs`	Updates the stream test helper to include `llm_metrics: None` and revises unit tests for deferred done timing, identity and argument ordering, EOF closeout, completed output ordering, and parallel call sequencing.
Anthropic HTTP replay suite `lib/llm/tests/anthropic_http_replay.rs`, `lib/llm/tests/data/replays/agent-harness/*`	Adds replay fixtures and Tokio HTTP tests for `/v1/messages` and `/v1/messages/count_tokens` covering text, fragmented tool calls, gated finish ordering, parallel tools, and tool-result round-trips.
OpenAI Responses HTTP replay suite `lib/llm/tests/responses_http_replay.rs`	Adds HTTP tests for `/v1/responses` covering unary text, streaming text, fragmented and parallel function calls, gated finish ordering, and function_call_output round-trips.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description covers the main changes and validation, but it omits required template sections and the required issue section.	Add the required Overview, Details, Where should reviewer start, and Related Issues sections, with either a linked issue or the no-issue confirmation.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly matches the main change: preserving fragmented agent tool calls.
Docstring Coverage	✅ Passed	Docstring coverage is 83.52% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/llm/src/protocols/openai/responses/stream_converter.rs (1)
306-420: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Normalize legacy delta.function_call chunks before closing on FunctionCall.

Line 418 handles FinishReason::FunctionCall, but Lines 306-416 only record delta.tool_calls. A backend that streams the legacy function_call delta shape will reach the finish reason with no FunctionCallState, so the Responses stream drops the call instead of emitting response.output_item.added/argument/done events. Please route delta.function_call through the same buffering path, likely as index 0 with a generated call id when the legacy shape has none. Based on PR objectives, legacy function-call finish behavior is part of this change.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 306
- 420, The stream conversion in ResponseStreamConverter currently buffers only
delta.tool_calls, so legacy delta.function_call chunks are lost before
FinishReason::FunctionCall closes the item. Update the buffering logic in the
tool-call handling path to also normalize delta.function_call into the same
FunctionCallState flow, likely using index 0 and generating a call_id when
missing, so the existing response.output_item.added and argument/done event
emission still occurs for legacy callers.

🧹 Nitpick comments (2)

lib/llm/src/protocols/anthropic/stream_converter.rs (1)
98-124: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

record_tool_call buffering is correct; started is effectively redundant.

started is set whenever id/name/arguments are present, but is_emit_ready() already gates on non-empty id and name, so a state can never be is_emit_ready() without started == true. Harmless, but the flag adds no signal beyond the id/name checks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/protocols/anthropic/stream_converter.rs` around lines 98 - 124,
The `record_tool_call` state machine in `stream_converter.rs` uses `started`
redundantly, since `is_emit_ready()` already depends on the presence of `id` and
`name`. Update `ToolCallState` handling and `record_tool_call` so the readiness
logic relies on the actual fields (`id`, `name`, and `argument_fragments`)
instead of maintaining `started`, and remove any remaining writes/checks tied to
that flag.
lib/llm/tests/data/replays/agent-harness/README.md (1)
3-5: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add the public issue reference for DYN-2764 if one exists.

External readers cannot resolve the Linear ID on its own. Please add the matching GitHub issue reference/link next to DYN-2764 (or replace it if the Linear ID is no longer needed). As per coding guidelines, Markdown docs should "prefer to also include the matching GitHub link when one exists".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/tests/data/replays/agent-harness/README.md` around lines 3 - 5, The
README for the agent-harness replay fixtures references DYN-2764 but does not
include a resolvable public issue link. Update the text in this document to add
the matching GitHub issue reference/link next to the DYN-2764 mention, or
replace the Linear ID if it is no longer needed, so external readers can follow
it; keep the existing mention of PR `#8284` intact.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/llm/tests/anthropic_http_replay.rs`:
- Around line 219-239: The test in `anthropic_http_replay.rs` is asserting the
wrong streaming order by requiring `message_delta` to be absent before
`gate.release()`. Update the `saw_message_delta` expectation in the
`tokio::time::timeout` loop around the `parser.push` handling so it allows a
finish-driven `message_delta` to appear before the gated usage tail, while still
verifying `content_block_stop` is observed before releasing the gate.

---

Outside diff comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 306-420: The stream conversion in ResponseStreamConverter
currently buffers only delta.tool_calls, so legacy delta.function_call chunks
are lost before FinishReason::FunctionCall closes the item. Update the buffering
logic in the tool-call handling path to also normalize delta.function_call into
the same FunctionCallState flow, likely using index 0 and generating a call_id
when missing, so the existing response.output_item.added and argument/done event
emission still occurs for legacy callers.

---

Nitpick comments:
In `@lib/llm/src/protocols/anthropic/stream_converter.rs`:
- Around line 98-124: The `record_tool_call` state machine in
`stream_converter.rs` uses `started` redundantly, since `is_emit_ready()`
already depends on the presence of `id` and `name`. Update `ToolCallState`
handling and `record_tool_call` so the readiness logic relies on the actual
fields (`id`, `name`, and `argument_fragments`) instead of maintaining
`started`, and remove any remaining writes/checks tied to that flag.

In `@lib/llm/tests/data/replays/agent-harness/README.md`:
- Around line 3-5: The README for the agent-harness replay fixtures references
DYN-2764 but does not include a resolvable public issue link. Update the text in
this document to add the matching GitHub issue reference/link next to the
DYN-2764 mention, or replace the Linear ID if it is no longer needed, so
external readers can follow it; keep the existing mention of PR `#8284` intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6f33e136-09ad-4cbd-95a4-7484d0d2d6e9

📥 Commits

Reviewing files that changed from the base of the PR and between c0bc4a0 and f839a4e.

⛔ Files ignored due to path filters (4)

lib/llm/tests/snapshots/anthropic_http_replay__anthropic_streaming_text.snap is excluded by !**/*.snap
lib/llm/tests/snapshots/anthropic_http_replay__anthropic_unary_text.snap is excluded by !**/*.snap
lib/llm/tests/snapshots/responses_http_replay__responses_streaming_text.snap is excluded by !**/*.snap
lib/llm/tests/snapshots/responses_http_replay__responses_unary_text.snap is excluded by !**/*.snap

📒 Files selected for processing (12)

lib/llm/src/protocols/anthropic/stream_converter.rs
lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs
lib/llm/src/protocols/openai/responses/stream_converter.rs
lib/llm/tests/anthropic_http_replay.rs
lib/llm/tests/common/http_harness.rs
lib/llm/tests/common/scripted_chat_engine.rs
lib/llm/tests/data/replays/agent-harness/README.md
lib/llm/tests/data/replays/agent-harness/fragmented-tool.sse
lib/llm/tests/data/replays/agent-harness/parallel-tools.sse
lib/llm/tests/data/replays/agent-harness/text.sse
lib/llm/tests/data/replays/agent-harness/thinking-tool.sse
lib/llm/tests/responses_http_replay.rs

coderabbitai · 2026-06-26T14:22:56Z

+        let mut saw_tool_stop = false;
+        let mut saw_message_delta = false;
+
+        tokio::time::timeout(Duration::from_secs(2), async {
+            while !saw_tool_stop {
+                let bytes = body
+                    .next()
+                    .await
+                    .expect("response ended before tool block completion")
+                    .expect("failed to read response SSE bytes");
+                for event in parser.push(&bytes).expect("failed to parse response SSE") {
+                    saw_tool_stop |= event == "content_block_stop";
+                    saw_message_delta |= event == "message_delta";
+                }
+            }
+        })
+        .await
+        .expect("tool block completion did not arrive before the gated usage tail");
+
+        assert!(!saw_message_delta);
+        gate.release();


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't assert that message_delta is delayed behind the usage tail.

Because the gate starts at the first usage chunk, a finish-driven message_delta should already be observable before gate.release(). Line 238 currently locks in the opposite contract and would fail once the stream publishes the stop reason immediately, which is the behavior described in this PR.

Suggested fix

let mut body = response.bytes_stream(); let mut parser = IncrementalSseParser::default(); let mut saw_tool_stop = false; let mut saw_message_delta = false; tokio::time::timeout(Duration::from_secs(2), async { - while !saw_tool_stop { + while !saw_tool_stop || !saw_message_delta { let bytes = body .next() .await .expect("response ended before tool block completion") .expect("failed to read response SSE bytes"); for event in parser.push(&bytes).expect("failed to parse response SSE") { saw_tool_stop |= event == "content_block_stop"; saw_message_delta |= event == "message_delta"; } } }) .await - .expect("tool block completion did not arrive before the gated usage tail"); + .expect("tool completion did not arrive before the gated usage tail"); - assert!(!saw_message_delta); + assert!(saw_message_delta);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let mut saw_tool_stop = false;

let mut saw_message_delta = false;

tokio::time::timeout(Duration::from_secs(2), async {

while !saw_tool_stop {

let bytes = body

.next()

.await

.expect("response ended before tool block completion")

.expect("failed to read response SSE bytes");

for event in parser.push(&bytes).expect("failed to parse response SSE") {

saw_tool_stop |= event == "content_block_stop";

saw_message_delta |= event == "message_delta";

}

}

})

.await

.expect("tool block completion did not arrive before the gated usage tail");

assert!(!saw_message_delta);

gate.release();

let mut saw_tool_stop = false;

let mut saw_message_delta = false;

tokio::time::timeout(Duration::from_secs(2), async {

while !saw_tool_stop || !saw_message_delta {

let bytes = body

.next()

.await

.expect("response ended before tool block completion")

.expect("failed to read response SSE bytes");

for event in parser.push(&bytes).expect("failed to parse response SSE") {

saw_tool_stop |= event == "content_block_stop";

saw_message_delta |= event == "message_delta";

}

}

})

.await

.expect("tool completion did not arrive before the gated usage tail");

assert!(saw_message_delta);

gate.release();

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/tests/anthropic_http_replay.rs` around lines 219 - 239, The test in `anthropic_http_replay.rs` is asserting the wrong streaming order by requiring `message_delta` to be absent before `gate.release()`. Update the `saw_message_delta` expectation in the `tokio::time::timeout` loop around the `parser.push` handling so it allows a finish-driven `message_delta` to appear before the gated usage tail, while still verifying `content_block_stop` is observed before releasing the gate.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

MatejKosec added 2 commits June 26, 2026 11:09

fix(llm): preserve fragmented agent tool calls

22b1a22

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

fix(llm): publish tool completions on finish

f839a4e

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

MatejKosec requested a review from a team as a code owner June 26, 2026 14:14

pull-request-size Bot added the size/XXL label Jun 26, 2026

MatejKosec temporarily deployed to external_collaborator June 26, 2026 14:14 — with GitHub Actions Inactive

github-actions Bot added fix documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Jun 26, 2026

MatejKosec mentioned this pull request Jun 26, 2026

fix(llm): preserve fragmented agent tool calls #10985

Closed

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

devin-ai-integration Bot reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llm): preserve fragmented agent tool calls#10988

fix(llm): preserve fragmented agent tool calls#10988
MatejKosec wants to merge 2 commits into
mainfrom
codex/dyn-2764-replay-harness-v2

MatejKosec commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MatejKosec commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Reviewer guidance

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MatejKosec commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading