fix(llm): preserve fragmented agent tool calls by MatejKosec · Pull Request #10985 · ai-dynamo/dynamo

MatejKosec · 2026-06-26T09:10:43Z

Summary

Add a deterministic, request-capturing chat engine and real HTTP replay coverage for the Anthropic Messages and OpenAI Responses APIs.
Cover unary and streaming text, fragmented and parallel tool calls, two-request tool-result exchanges, and Anthropic token counting with eleven endpoint tests and four stable snapshots.
Delay Responses function-call completion until the backend stream ends, and buffer Anthropic tool fragments so parallel calls are emitted as sequential, non-overlapping content blocks.
Document the minimized fixture provenance and repair the current-main llm_metrics test initializer needed by the full library suite.

This replaces the stale approach in #8284 and implements DYN-2764 without a model, GPU, Docker, or a new dependency.

Validation

Validation commands run on a Linux build host with CUDA toolchain:

cargo fmt --all -- --check
cargo clippy -p dynamo-llm --tests -- -D warnings
cargo test -p dynamo-llm --lib stream_converter::tests — 20 passed
INSTA_UPDATE=no cargo test -p dynamo-llm --test anthropic_http_replay --test responses_http_replay — 11 passed
cargo test -p dynamo-llm -- --test-threads=1 — all 1,375 active library tests passed, followed by the integration binaries until ten existing preprocessor cases received HTTP 401 while downloading gated meta-llama/Llama-3.1-70B-Instruct assets

Two independent AI reviews examined protocol correctness and harness/test design. Their block-ordering, terminal-SSE, completion-assertion, fixture-provenance, ID-canonicalization, and duplicate-call findings were addressed; both reviewers re-checked the revised diff and reported no remaining actionable findings.

Summary by CodeRabbit

New Features
- Added end-to-end HTTP coverage for AI chat and responses streaming, including text, tool calls, parallel tools, and tool-result round trips.
- Added replay fixtures for common streamed scenarios, including fragmented tool arguments and reasoning + tool-use flows.
Bug Fixes
- Improved tool-call streaming so fragmented arguments are finalized consistently at stream end.
- Fixed completion event timing so tool-call “done” events are emitted after the full stream, matching expected output order.
Tests
- Expanded integration tests and shared harness utilities for more reliable, deterministic protocol validation.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

github-actions · 2026-06-26T09:10:52Z

🌿 Fern Docs Preview: https://nvidia-preview-d100dd50-3da1-48ba-95b6-bf403c739afb.docs.buildwithfern.com/dynamo/dev

coderabbitai · 2026-06-26T10:42:47Z

Walkthrough

Anthropic and OpenAI Responses stream converters now defer tool-call completion events to stream end. The PR also adds shared HTTP test harness support, replay fixtures, and end-to-end Anthropic and Responses replay tests.

Changes

Tool-call buffering and replay coverage

Layer / File(s)	Summary
Anthropic buffering and tests `lib/llm/src/protocols/anthropic/stream_converter.rs`	Tool-call arguments are buffered as fragments and emitted at stream end, and the unit tests were updated for buffered stop timing, ordering, and duplicate-id behavior.
OpenAI Responses buffering and tests `lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs`, `lib/llm/src/protocols/openai/responses/stream_converter.rs`	Function-call completion is deferred to stream end, and the related tests and chunk helper were updated to match the new event timing.
HTTP harness and replay fixtures `lib/llm/tests/common/`, `lib/llm/tests/data/replays/agent-harness/`	The shared harness starts scripted chat services, parses and canonicalizes SSE bodies, and adds the replay README plus the text, tool, parallel-tool, and thinking-tool fixtures.
Anthropic HTTP replay tests `lib/llm/tests/anthropic_http_replay.rs`	Anthropic Messages replay tests cover unary, streaming, fragmented tool arguments, parallel tools, tool-result round trips, and token counting.
Responses HTTP replay tests `lib/llm/tests/responses_http_replay.rs`	Responses replay tests cover unary, streaming, fragmented function calls, parallel tool calls, and function-call output round trips.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description covers summary and validation, but it does not follow the required template and is missing Overview, Details, Where to start, and Related Issues sections.	Rewrite it using the repository template and add the missing sections, including a Related Issues entry such as Closes `#XXXX` or the no-issue confirmation.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: preserving fragmented agent tool calls.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/llm/src/protocols/anthropic/stream_converter.rs (1)
305-318: 🎯 Functional Correctness | 🟡 Minor

Handle id/name-only tool-call deltas

started only flips when func.arguments is present, so a tool call that arrives with id/name but never sends arguments is dropped at end-of-stream. The codebase already has arguments: None tool-call chunks, so consider marking the call as started earlier or otherwise emitting an empty tool_use block instead of omitting it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/protocols/anthropic/stream_converter.rs` around lines 305 - 318,
The tool-call delta handling in the Anthropic stream converter only marks a call
as started when `func.arguments` is present, so id/name-only chunks can be
dropped at end-of-stream. Update `stream_converter.rs` in the tool-call state
update logic to mark the call as started as soon as an `id` or `name` is
received, or otherwise ensure `ToolCallState` emits an empty `tool_use` block
when `arguments` stays `None`, so `id`/`name`-only deltas are preserved.

🧹 Nitpick comments (1)

lib/llm/tests/data/replays/agent-harness/README.md (1)
3-5: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add the matching GitHub link for DYN-2764.

This doc references the internal Linear ticket DYN-2764 but omits a GitHub link. External contributors can't resolve Linear IDs, so include the matching GitHub issue (e.g. #NNNN) alongside it if one exists, as you already do for PR #8284.

As per coding guidelines: "Markdown documentation files may reference Linear tickets for internal context, but should prefer to also include the matching GitHub link when one exists".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/tests/data/replays/agent-harness/README.md` around lines 3 - 5, The
README currently mentions the internal Linear ticket DYN-2764 without a
corresponding public GitHub reference; update the fixture description to include
the matching GitHub issue link or issue number alongside DYN-2764, similar to
how PR `#8284` is referenced. Use the existing README text near the DYN-2764
mention and keep the wording concise so external contributors can identify the
related GitHub item without needing Linear access.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/llm/tests/responses_http_replay.rs`:
- Around line 326-347: The round-trip test in responses_http_replay.rs hardcodes
the function call output `call_id` instead of reusing the one returned in the
first response. Update the test around `first_body["output"]` and the
`post_responses` payload so the `function_call_output` uses the captured
`function_call`’s `call_id`, and adjust the later assertions to compare against
that same dynamic value rather than `call_list_directory`.

---

Outside diff comments:
In `@lib/llm/src/protocols/anthropic/stream_converter.rs`:
- Around line 305-318: The tool-call delta handling in the Anthropic stream
converter only marks a call as started when `func.arguments` is present, so
id/name-only chunks can be dropped at end-of-stream. Update
`stream_converter.rs` in the tool-call state update logic to mark the call as
started as soon as an `id` or `name` is received, or otherwise ensure
`ToolCallState` emits an empty `tool_use` block when `arguments` stays `None`,
so `id`/`name`-only deltas are preserved.

---

Nitpick comments:
In `@lib/llm/tests/data/replays/agent-harness/README.md`:
- Around line 3-5: The README currently mentions the internal Linear ticket
DYN-2764 without a corresponding public GitHub reference; update the fixture
description to include the matching GitHub issue link or issue number alongside
DYN-2764, similar to how PR `#8284` is referenced. Use the existing README text
near the DYN-2764 mention and keep the wording concise so external contributors
can identify the related GitHub item without needing Linear access.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5a7ff35b-eda4-4eaa-924c-2d80c1054e93

📥 Commits

Reviewing files that changed from the base of the PR and between 15bdb11 and 22b1a22.

⛔ Files ignored due to path filters (4)

lib/llm/tests/snapshots/anthropic_http_replay__anthropic_streaming_text.snap is excluded by !**/*.snap
lib/llm/tests/snapshots/anthropic_http_replay__anthropic_unary_text.snap is excluded by !**/*.snap
lib/llm/tests/snapshots/responses_http_replay__responses_streaming_text.snap is excluded by !**/*.snap
lib/llm/tests/snapshots/responses_http_replay__responses_unary_text.snap is excluded by !**/*.snap

📒 Files selected for processing (12)

lib/llm/src/protocols/anthropic/stream_converter.rs
lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs
lib/llm/src/protocols/openai/responses/stream_converter.rs
lib/llm/tests/anthropic_http_replay.rs
lib/llm/tests/common/http_harness.rs
lib/llm/tests/common/scripted_chat_engine.rs
lib/llm/tests/data/replays/agent-harness/README.md
lib/llm/tests/data/replays/agent-harness/fragmented-tool.sse
lib/llm/tests/data/replays/agent-harness/parallel-tools.sse
lib/llm/tests/data/replays/agent-harness/text.sse
lib/llm/tests/data/replays/agent-harness/thinking-tool.sse
lib/llm/tests/responses_http_replay.rs

coderabbitai · 2026-06-26T10:42:50Z

+        let function_call = first_body["output"]
+            .as_array()
+            .unwrap()
+            .iter()
+            .find(|item| item["type"] == "function_call")
+            .expect("first response did not contain a function call")
+            .clone();
+
+        let second_response = post_responses(
+            &svc,
+            &json!({
+                "model": MODEL,
+                "stream": false,
+                "max_output_tokens": 64,
+                "tools": [tool("list_directory")],
+                "input": [
+                    {"role": "user", "content": "List /tmp"},
+                    function_call,
+                    {
+                        "type": "function_call_output",
+                        "call_id": "call_list_directory",
+                        "output": "[\"a.txt\"]"


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Use the first response’s call_id instead of hardcoding it.

This test is meant to validate round-trip propagation, but the second request and final assertions bake in call_list_directory rather than reusing whatever call_id the first response actually returned. That makes the test fixture-dependent and can fail even when the implementation preserves IDs correctly.

Suggested change

let function_call = first_body["output"] .as_array() .unwrap() .iter() .find(|item| item["type"] == "function_call") .expect("first response did not contain a function call") .clone(); + let call_id = function_call["call_id"] + .as_str() + .expect("function call missing call_id") + .to_owned(); let second_response = post_responses( &svc, &json!({ @@ function_call, { "type": "function_call_output", - "call_id": "call_list_directory", + "call_id": call_id.clone(), "output": "[\"a.txt\"]" } ] }), @@ let calls = assistant.tool_calls.as_deref().expect("tool calls missing"); assert_eq!(calls.len(), 1); - assert_eq!(calls[0].id, "call_list_directory"); + assert_eq!(calls[0].id, call_id); assert_eq!(calls[0].function.name, "list_directory"); assert_eq!(calls[0].function.arguments, r#"{"path":"/tmp"}"#); - assert_eq!(tool_result.tool_call_id, "call_list_directory"); + assert_eq!(tool_result.tool_call_id, call_id);

Also applies to: 373-376

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/tests/responses_http_replay.rs` around lines 326 - 347, The round-trip test in responses_http_replay.rs hardcodes the function call output `call_id` instead of reusing the one returned in the first response. Update the test around `first_body["output"]` and the `post_responses` payload so the `function_call_output` uses the captured `function_call`’s `call_id`, and adjust the later assertions to compare against that same dynamic value rather than `call_list_directory`.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

MatejKosec · 2026-06-26T14:14:25Z

Superseded by #10988, which includes the finish-signal completion timing and subsequent ordering corrections.

fix(llm): preserve fragmented agent tool calls

22b1a22

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

MatejKosec temporarily deployed to external_collaborator June 26, 2026 09:10 — with GitHub Actions Inactive

pull-request-size Bot added the size/XXL label Jun 26, 2026

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 09:10 Inactive

github-actions Bot added external-contribution Pull request is from an external contributor fix documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Jun 26, 2026

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 09:11 Inactive

This comment has been minimized.

Sign in to view

MatejKosec marked this pull request as ready for review June 26, 2026 10:35

MatejKosec requested a review from a team as a code owner June 26, 2026 10:35

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

devin-ai-integration Bot reviewed Jun 26, 2026

View reviewed changes

MatejKosec mentioned this pull request Jun 26, 2026

fix(llm): preserve fragmented agent tool calls #10988

Open

MatejKosec closed this Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llm): preserve fragmented agent tool calls#10985

fix(llm): preserve fragmented agent tool calls#10985
MatejKosec wants to merge 1 commit into
ai-dynamo:mainfrom
MatejKosec:codex/dyn-2764-replay-harness

MatejKosec commented Jun 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

MatejKosec commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MatejKosec commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

MatejKosec commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MatejKosec commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading