Skip to content

fix(llm): propagate chat completion token details#11027

Open
aishwaryabandapelly-ai wants to merge 2 commits into
ai-dynamo:mainfrom
aishwaryabandapelly-ai:phase2-reasoning-token-plan
Open

fix(llm): propagate chat completion token details#11027
aishwaryabandapelly-ai wants to merge 2 commits into
ai-dynamo:mainfrom
aishwaryabandapelly-ai:phase2-reasoning-token-plan

Conversation

@aishwaryabandapelly-ai

@aishwaryabandapelly-ai aishwaryabandapelly-ai commented Jun 28, 2026

Copy link
Copy Markdown

Overview:

Closes #2941.

Chat Completions responses were not surfacing completion_tokens_details, which can carry reasoning_tokens, from backend usage metadata into the final response usage field.

This PR propagates completion_tokens_details in the Chat Completions delta generator, following the existing pattern already used for prompt_tokens_details.

Details:

  • Updated lib/llm/src/protocols/openai/chat_completions/delta.rs
  • Propagated completion_usage.completion_tokens_details into self.usage.completion_tokens_details
  • Added a targeted unit test for completion token details propagation

Where should the reviewer start?

Please start with:

lib/llm/src/protocols/openai/chat_completions/delta.rs

The production change is small and mirrors the existing prompt_tokens_details propagation pattern.

Related Issues

🔗 This PR is linked to an issue:

Testing:

Passed locally:

cargo fmt
git diff --check

Added unit test:

test_completion_token_details_are_propagated_from_backend_usage

Attempted locally:

cargo check -p dynamo-llm
cargo test -p dynamo-llm test_completion_token_details_are_propagated_from_backend_usage --lib

Both commands progressed into the dynamo-llm crate but could not complete on macOS because of Linux-specific APIs related to NUMA, fallocate, DiskStorage, and O_DIRECT. This appears to be a local platform limitation rather than an issue caused by this change.

Acceptance Criteria

  • Tests added for changed behavior
  • Relevant Rust tests passed on Linux CI
  • Rust clippy passed on Linux CI
  • Follows existing code style and mirrors the existing prompt_tokens_details propagation pattern
  • No unrelated files changed
  • No breaking changes introduced
  • Documentation update not applicable because this is a small backend usage-field fix

Signed-off-by: aishwaryabandapelly-ai <aishwaryabandapelly@gmail.com>
Signed-off-by: aishwaryabandapelly-ai <aishwaryabandapelly@gmail.com>
@aishwaryabandapelly-ai aishwaryabandapelly-ai requested a review from a team as a code owner June 28, 2026 07:44
@copy-pr-bot

copy-pr-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown
Contributor

👋 Hi aishwaryabandapelly-ai! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Jun 28, 2026
@datadog-official

datadog-official Bot commented Jun 28, 2026

Copy link
Copy Markdown

Pipelines

⚠️ Warnings

🚦 2 Pipeline jobs failed

Docs link check | lychee   View in Datadog   GitHub Actions

Lint PR | Validate PR title and add label   View in Datadog   GitHub Actions

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 1c787bd | Docs | Give us feedback!

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

In choice_from_postprocessor, when backend completion_usage contains completion_tokens_details, those details are now cloned into the generator's usage struct, mirroring the existing handling of prompt_tokens_details. A new unit test verifies that reasoning_tokens from backend usage is reflected in get_usage().

Changes

completion_tokens_details propagation

Layer / File(s) Summary
Propagate and test completion_tokens_details
lib/llm/src/protocols/openai/chat_completions/delta.rs
choice_from_postprocessor clones backend completion_tokens_details into self.usage.completion_tokens_details; test imports CompletionTokensDetails and a new test asserts reasoning_tokens round-trips through get_usage().

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The code propagates backend completion token details into response usage and adds a test, matching #2941's reasoning token goal.
Out of Scope Changes check ✅ Passed The diff is narrowly scoped to the requested propagation logic and a targeted test, with no unrelated changes.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title clearly matches the main change: propagating chat completion token details.
Description check ✅ Passed The PR description matches the required template with overview, details, review start, and a linked issue section.

Comment @coderabbitai help to get the list of available commands.

@aishwaryabandapelly-ai aishwaryabandapelly-ai changed the title Phase2 reasoning token plan fix(llm): propagate chat completion token details Jun 28, 2026
@github-actions github-actions Bot added the fix label Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Support reasoning tokens in response usage field

1 participant