Skip to content

fix: export thoughts_token_count to OpenTelemetry trace spans#4835

Closed
brucearctor wants to merge 4 commits into
google:mainfrom
brucearctor:fix/thinking-tokens-in-traces
Closed

fix: export thoughts_token_count to OpenTelemetry trace spans#4835
brucearctor wants to merge 4 commits into
google:mainfrom
brucearctor:fix/thinking-tokens-in-traces

Conversation

@brucearctor

Copy link
Copy Markdown
Contributor

Description

Fixes #4829

ADK's OpenTelemetry tracing does not export thoughts_token_count to span attributes. When using Gemini models with ThinkingConfig, the usage_metadata in LlmResponse correctly contains thoughts_token_count, but this field is never written to spans by trace_generate_content_result() or trace_inference_result().

Interestingly, trace_call_llm() already exports this field (as gen_ai.usage.experimental.reasoning_tokens). This PR adds the same export to the two remaining functions that were missing it.

Changes

src/google/adk/telemetry/tracing.py

  • Added thoughts_token_countgen_ai.usage.experimental.reasoning_tokens span attribute export in trace_generate_content_result() (~line 746)
  • Added the same export in trace_inference_result() (~line 789)
  • Uses the same try/except AttributeError guard pattern as trace_call_llm() for backward compatibility with older SDK versions

tests/unittests/telemetry/test_spans.py

  • Added test_trace_inference_result_with_thinking_tokens — verifies the attribute is exported when thoughts_token_count is non-None
  • Added test_trace_inference_result_without_thinking_tokens — verifies no attribute is set when thoughts_token_count is None

Testing Plan

Unit Tests

All 23 telemetry tests pass:

$ pytest tests/unittests/telemetry/test_spans.py -v
23 passed in 1.08s

New tests specifically verify:

  1. thoughts_token_count=50 → span attribute gen_ai.usage.experimental.reasoning_tokens=50 is set
  2. thoughts_token_count=None → no gen_ai.usage.experimental.reasoning_tokens attribute on span

Verification

Before fix — Event.usage_metadata.thoughts_token_count is non-zero but Cloud Trace spans only show gen_ai.usage.input_tokens and gen_ai.usage.output_tokens.

After fix — gen_ai.usage.experimental.reasoning_tokens appears alongside the existing token attributes in all three tracing functions.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@adk-bot adk-bot added the tracing [Component] This issue is related to OpenTelemetry tracing label Mar 14, 2026
@rohityan rohityan self-assigned this Mar 17, 2026
@rohityan rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Mar 17, 2026
@rohityan

Copy link
Copy Markdown
Collaborator

Hi @brucearctor, Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix the mypy-diff errors.

@brucearctor

Copy link
Copy Markdown
Contributor Author

@rohityan -- will do

@brucearctor

Copy link
Copy Markdown
Contributor Author

@rohityan going to let it run, but I think addressed the diff/new error, and solved a couple others :-)

Let me know if other concerns. Cheers -

@rohityan

Copy link
Copy Markdown
Collaborator

Hi @brucearctor , can you please resolve branch conflicts.

Add thoughts_token_count as gen_ai.usage.experimental.reasoning_tokens
span attribute in trace_generate_content_result() and
trace_inference_result(), matching the existing pattern in
trace_call_llm().

Fixes google#4829
- Refactor trace_inference_result() to use otel_span local variable,
  eliminating all 4 union-attr mypy errors (1 new + 3 pre-existing)
- Add tests for trace_generate_content_result() thinking tokens
- Import trace_generate_content_result in test_spans.py
@brucearctor brucearctor force-pushed the fix/thinking-tokens-in-traces branch from bfb9457 to 3a2cef7 Compare May 12, 2026 03:04
@brucearctor

Copy link
Copy Markdown
Contributor Author

@rohityan Rebased onto latest main and resolved the conflict in tests/unittests/telemetry/test_spans.py (upstream added _safe_json_serialize circular dict tests at the same location as the thinking token tests — kept both). Ready for CI. 👍

@wojcikm

wojcikm commented Jun 2, 2026

Copy link
Copy Markdown

Does the scope of this issue (and PR #4835) also cover the BigQueryAgentAnalyticsPlugin? The BQ plugin has a separate code path from tracing.py and currently does not write thoughts_token_count to the BigQuery table either.

We have LookML dashboards consuming agent_events from BQ and our cost formulas are ready to include thinking tokens (via COALESCE on usage_metadata.thoughts_token_count), but the column stays NULL because the BQ plugin never extracts it.

If BQ plugin is out of scope for current one, happy to open a separate issue.

@brucearctor

Copy link
Copy Markdown
Contributor Author

I went ahead and got conflicts resolved again.

@wojcikm : I do not mind doing, but this has also been open for quite awhile, and would be good to close based on my original understanding. Happy to address [ or for someone else to ], if we want to include BigQueryAgentAnalyticsPlugin as in-scope.

But, looks like we need @rohityan , @jawoszek as assigned reviewer or other to take a look. Not sure who is in charge of determining scope.

@brucearctor

Copy link
Copy Markdown
Contributor Author

ah ... actually pushing -->

Resolve conflict in tests/unittests/telemetry/test_spans.py by keeping
both our thinking token tests and upstream's new tests for error detection,
error_type parameter, and extra generate content attributes.
@adk-bot

adk-bot commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

🔍 ADK Pull Request Analysis: PR #4835

Title: fix: export thoughts_token_count to OpenTelemetry trace spans
Author: @brucearctor
Status: open
Impact: 128 additions, 5 deletions across 2 changed files


Executive Summary

  1. Core Objective: Add OpenTelemetry span attribute export for reasoning/thinking token counts (thoughts_token_count) under 'gen_ai.usage.experimental.reasoning_tokens' in trace_generate_content_result() and trace_inference_result().
  2. Justification & Value: Justified Fix - Fills a critical observability gap where thinking token usage is correctly captured in usage_metadata for Gemini 2.0+ models but left unexported in two of the major model telemetry spans.
  3. Alignment with Principles: Pass - Implementation is highly decoupled, maintains clean typing, avoids breaking modifications, and handles backwards safety via targeted AttributeError exception catching.
  4. Recommendation: Approve - The changes are clean, address the stated issues perfectly, clear compilation errors, and provide exhaustive test coverage.

Detailed Findings & Analysis

1. Objectives & Impact ("What does it do?")

  • Context & Background: Tracing configurations for Gemini models utilizing reasoning/thinking configurations (e.g. ThinkingConfig) output thoughts_token_count metrics within LlmResponse.usage_metadata. Initially, only trace_call_llm exported this field correctly. Linked Issue #4829 highlighted that telemetry spans generated via trace_generate_content_result() or trace_inference_result() lacked the corresponding gen_ai.usage.experimental.reasoning_tokens attribute.
  • Implementation Mechanism:
    • Exposes thoughts_token_count from metadata and pushes it to OpenTelemetry trace spans utilizing the 'gen_ai.usage.experimental.reasoning_tokens' attribute block.
    • Implements an AttributeError try-except handler wrapper to gracefully handle older GenAI SDK environments where thoughts_token_count may not be configured.
    • Prevents type-narrowing static compilation issues during testing or lint checks in trace_inference_result by renaming the in-place parameter reference span to otel_span. This elegantly resolves strict mypy issues.
  • Affected Surface: Telemetry traces generated via the GeneratorContentSpan flow. There is no public API breaking change.

2. Justification & Value ("Is it a valid and useful change?")

  • Workspace Verification:
    • Investigated tracing.py: verified that while trace_call_llm indeed has the logic to retrieve and record experimental reasoning tokens, trace_generate_content_result and trace_inference_result were completely omitting it.
    • Verification confirms that the issue reported in Issue #4829 represents a genuine bug that limits cost calculation, trace dashboards, and token monitoring pipelines.
  • Value Assessment: Highly valuable. Tracking thinking token consumption is essential for developers using reasoning models (like gemini-2.5-flash) to trace operational costs and analyze token usage spikes within cloud aggregators like Cloud Trace.
  • Alternative Approaches: No cleaner alternative structure exists. The PR follows exact architectural patterns of other token parameters. Using a string literal for the experimental reasoning key is consistent with standard practices since Opentelemetry's incubating attributes do not define reasoning token conventions in public stables yet.
  • Scope & Depth: Symptom / Systematic Fix
    • This is a systematic fix for trace-based spans.
    • Recommendation Note: As pointed out by community contributors, other telemetry subsystems (specifically bigquery_agent_analytics_plugin.py) do not extract or record thoughts_token_count values into structured tables. While the current PR fully addresses the trace span scope, a subsequent task or issue should be raised to update analytics plugins.

3. Principle & Style Alignment Checklist ("Does it follow rules?")

  • Public API & Visibility Boundaries:
    • Status: Pass
    • Analysis: No changes made to public method signatures or namespaces. Standard structures and parameters are fully preserved with backward compatibility.
  • Code Quality, Typing & Conventions:
    • Status: Pass
    • Analysis: Complies with from __future__ import annotations styling. The type-hint error introduced by in-place variable rebinding was perfectly addressed by moving the parameter span into the localized otel_span typing variable.
  • Robustness & Edge Cases:
    • Status: Pass
    • Analysis: Robust boundary/null checks prevent crashing when usage_metadata or individual token counters are absent.
  • Test Integrity & Quality:
    • Status: Pass
    • Analysis: Four new comprehensive unit test functions are added within test_spans.py focusing on both active thinking tokens versus cases where token values are None. Tests conform strictly to standard mock assertions and follow the structured AAA pattern.

Phase Summary & Suggested Action

I recommend merging this patch to resolve OpenTelemetry reasoning-token exporting gaps. To address community concerns raised in peer reviews, we should additionally track the BigQueryAgentAnalyticsPlugin integration in a separate enhancement issue.

@boyangsvl boyangsvl assigned boyangsvl and unassigned rohityan Jun 16, 2026
@boyangsvl

Copy link
Copy Markdown
Collaborator

thought token is supported in the newer version of ADK: src/google/adk/telemetry/_token_usage.py
Closing this PR as it's using the old experimental key gen_ai.usage.experimental.reasoning_tokens instead of the current standard: gen_ai.usage.reasoning.output_tokens

@boyangsvl boyangsvl closed this Jun 16, 2026
@brucearctor

brucearctor commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

So #4829 is closed [ or should be ]?

What's the PR that closed it? should that get linked to the issue?

@boyangsvl

Copy link
Copy Markdown
Collaborator

It's addressed internally so there's no PR associated with it. The code is here: https://github.com/google/adk-python/blob/main/src/google/adk/telemetry/_token_usage.py#L34

@brucearctor

Copy link
Copy Markdown
Contributor Author

Looks like this PR: #6022 ?

@boyangsvl

Copy link
Copy Markdown
Collaborator

Thanks! I've added it to the original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

request clarification [Status] The maintainer need clarification or more information from the author tracing [Component] This issue is related to OpenTelemetry tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thinking tokens not in traces

6 participants