Skip to content

feat(cost): per-modality token breakdown in Python SDK (F3 Track C)#596

Open
john-weiler wants to merge 2 commits into
mainfrom
feat/f3-multimodal-cost-sdk-python
Open

feat(cost): per-modality token breakdown in Python SDK (F3 Track C)#596
john-weiler wants to merge 2 commits into
mainfrom
feat/f3-multimodal-cost-sdk-python

Conversation

@john-weiler

@john-weiler john-weiler commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

User description

Summary

  • Extends LlmMetrics with num_image_input_tokens, num_audio_input_tokens, num_audio_output_tokens (optional, forward-compat safe)
  • Threads through logger.add_llm_span()base_handlerspan_params
  • LangChain handler: _extract_gemini_modality_breakdown() checks three surfaces (priority order):
    1. message.usage_metadata.input_token_details / output_token_details (langchain-google-genai ≥ 2.x with LangChain Core UsageMetadata)
    2. message.response_metadata.prompt_tokens_details / candidates_tokens_details (raw Gemini API list shape)
    3. message.response_metadata["usage_metadata"] nested dict — defensive fallback for providers that nest usage under response_metadata (added in response to review comment; covers forward-compat if upstream langchain-google-genai adds modality info there)
  • Sync + async handlers both updated; also handles a ChatGeneration or raw AIMessage interchangeably
  • galileo-adk: _extract_usage_metadata() walks native SDK prompt_tokens_details/candidates_tokens_details (handles .modality as enum or string); span_manager.end_llm() carries 3 new fields through to end_node()
  • crewai, openai_agents, otel: untouched — OpenAI-compat path drops breakdown (per RFC)

Known limitation — langchain-google-genai 4.2.4

Discovered during local E2E testing: langchain-google-genai 4.2.4 (latest) does not surface Gemini's per-modality breakdown on the AIMessage. The wrapper at chat_models.py:1263 hardcodes:

input_token_details={"cache_read": cache_read_tokens}

…reading prompt_token_count and cached_content_token_count from the Gemini response but dropping prompt_tokens_details (which contains the modality breakdown). Verified end-to-end:

Path prompt_tokens_details returned to consumer
google-genai SDK directly [(TEXT, 10), (AUDIO, 8)]
langchain-google-genaiAIMessage.usage_metadata.input_token_details {"cache_read": 0} (modality stripped)
langchain-google-genaiAIMessage.response_metadata only finish_reason, model_name, safety_ratings, model_provider

Until langchain-google-genai is patched upstream, the LangChain Gemini path will not produce per-modality cost data. The extractor in this PR is correct and will start producing data automatically once the upstream surfaces it. Per-modality cost continues to work for:

  • galileo-adk (uses google-genai SDK directly — verified breakdown reaches the extractor)
  • Direct logger.add_llm_span(audio_input_tokens=…, …) calls
  • Vertex AI judge spans (covered in runners PR #2480)

Path forward: file an upstream PR against langchain-google-genai to thread prompt_tokens_details through input_token_details. Estimated ~10 lines.

Test plan

  • 4 existing TestParseLlmResult cases (Surface 1, Surface 2, text-only, precedence)
  • 2 new TestParseLlmResult cases (Surface 3 nested-usage, Surface 1 wins over Surface 3)
  • 6 new TestExtractUsageMetadata ADK cases covering audio+image, text-only, empty candidates, enum-vs-string modality
  • All existing tests pass
  • Local E2E validated end-to-end via google-genai direct + logger.add_llm_span(audio_input_tokens=…)

Depends on runners feat/f3-multimodal-cost-foundation (orbit) + runners PR #2480.

🤖 Generated with Claude Code


Generated description

Below is a concise technical summary of the changes proposed in this PR:
Enable per-modality token accounting across Gemini-native paths by extending LLMEndResult, LlmMetrics, and the span/logging stack so every GalileoLogger span can persist image/audio usage from both LangChain handlers and the ADK observer. Ensure the new modality inputs flow through SpanManager/GalileoBaseHandler logging and validate the extraction logic via LangChain and ADK observer tests while also rendering OpenAI multimodal responses as content blocks.

TopicDetails
Multimodal ingest Convert OpenAI content parts into TextContentBlock/DataContentBlock payloads so the UI shows imported images or audio, returning a LoggedMessage whenever the parts are understood.
Modified files (1)
  • src/galileo/openai/extractors.py
Latest Contributors(2)
UserCommitDate
jweiler@galileo.aifixesJune 08, 2026
fernando.correia@galil...chore: update ruff tar...April 07, 2026
Modality metrics Aggregate Gemini-modality counts from GalileoCallback, GalileoAsyncCallback, and ADK observer metadata, thread them through LLMEndResult, LlmMetrics, GalileoLogger, SpanManager, and GalileoBaseHandler, and cover the extraction surfaces with the new LangChain and ADK observer tests.
Modified files (10)
  • galileo-adk/src/galileo_adk/observer.py
  • galileo-adk/src/galileo_adk/span_manager.py
  • galileo-adk/tests/test_observer.py
  • src/galileo/handlers/base_handler.py
  • src/galileo/handlers/langchain/async_handler.py
  • src/galileo/handlers/langchain/handler.py
  • src/galileo/handlers/langchain/utils.py
  • src/galileo/logger/logger.py
  • src/galileo/resources/models/llm_metrics.py
  • tests/test_langchain.py
Latest Contributors(2)
UserCommitDate
jweiler@galileo.aifixesJune 08, 2026
sankar@galileo.aifeat: galileo adk to n...February 22, 2026
Review this PR on Baz | Customize your next review

Extends LlmMetrics with optional num_image_input_tokens / num_audio_input_tokens /
num_audio_output_tokens fields and threads them through the full ingestion path:
LlmMetrics model → logger.add_llm_span() → base_handler → span_params.

Extraction is implemented in two integrations:

LangChain (primary path):
- _extract_gemini_modality_breakdown() checks two surfaces:
  1. message.usage_metadata.input_token_details / output_token_details
     (langchain-google-genai >= 2.x with LangChain Core UsageMetadata)
  2. message.response_metadata.prompt_tokens_details / candidates_tokens_details
     (raw Gemini API ModalityTokenCount list forwarded by the adapter)
- Both sync and async handlers updated.

galileo-adk (native Gemini SDK path):
- _extract_usage_metadata() extended to walk prompt_tokens_details /
  candidates_tokens_details (ModalityTokenCount list with .modality enum
  or string) and bucket into image/audio counts.
- span_manager.end_llm() carries the 3 new fields through to end_node().

All other integrations (crewai, openai_agents, otel) unchanged — they go
through the OpenAI-compat path which drops modality breakdown (per RFC).

Tests: 4 new TestParseLlmResult cases + 6 new TestExtractUsageMetadata
ADK cases; all existing tests pass.

Part of F3 multimodal token/cost support.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 67.25664% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.13%. Comparing base (f9feed6) to head (bee7b89).

Files with missing lines Patch % Lines
src/galileo/openai/extractors.py 15.38% 33 Missing ⚠️
src/galileo/handlers/langchain/utils.py 94.59% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #596      +/-   ##
==========================================
- Coverage   83.31%   83.13%   -0.19%     
==========================================
  Files         125      125              
  Lines       10659    10771     +112     
==========================================
+ Hits         8881     8954      +73     
- Misses       1778     1817      +39     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 522 to +533

# Per-modality breakdown — only present on native Gemini SDK responses.
prompt_details = getattr(usage, "prompt_tokens_details", None)
candidates_details = getattr(usage, "candidates_tokens_details", None)
if prompt_details or candidates_details:
image_in = 0
audio_in = 0
audio_out = 0
has_prompt = bool(prompt_details)
has_candidates = bool(candidates_details)
for entry in prompt_details or []:
modality_attr = getattr(entry, "modality", None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_extract_usage_metadata duplicates the Gemini modality breakdown in ::_extract_gemini_modality_breakdown, should we extract a shared helper so both paths use the same logic?

Severity

Want Baz to fix this for you? Activate Fixer

Comment thread tests/test_langchain.py
Comment on lines +1300 to +1304
def test_gemini_modality_from_usage_metadata_input_token_details(self) -> None:
"""Gemini native path: per-modality breakdown from usage_metadata.input/output_token_details."""
ai_message = AIMessage(content="hello")
ai_message.usage_metadata = {
"input_tokens": 110,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new Gemini modality tests don't follow the AGENTS.md # Given/# When/# Then pytest style, should we add those comments to each case so the new coverage matches the repo's canonical pattern?

Severity

Want Baz to fix this for you? Activate Fixer You can also update your AI coding guidelines based on this comment by apply pr to [branch name]

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
tests/test_langchain.py around lines 1300-1304, update the new Gemini modality test(s)
that were added in this hunk to follow the repository’s canonical pytest style from
AGENTS.md. For each new test method in that range (e.g.,
`test_gemini_modality_from_usage_metadata_input_token_details`,
`test_gemini_modality_from_response_metadata_prompt_tokens_details`,
`test_no_gemini_modality_for_text_only_response`, and
`test_gemini_modality_input_token_details_takes_precedence`), add sentence-case `#
Given: ...`, `# When: ...`, and `# Then: ...` comments describing the setup, the call to
`parse_llm_result`, and the assertions. Keep the existing logic the same—only
add/adjust the comment structure so it matches the earlier token-usage tests in the same
file/class.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit bee7b89 addressed this comment by adding # Given, # When, and # Then comments to the new Gemini modality tests in tests/test_langchain.py. The new cases now match the repo’s canonical pytest style without changing the test logic.

Comment on lines +94 to +97
response_meta = getattr(msg, "response_metadata", None)
if isinstance(response_meta, dict):
prompt_details = response_meta.get("prompt_tokens_details") or []
candidates_details = response_meta.get("candidates_tokens_details") or []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIMessage.response_metadata only reads top-level token detail keys; can we also read response_metadata['usage_metadata'] so prompt_details / candidates_details (and image_input_tokens / audio_*) aren’t left empty, with a top-level fallback if needed?

Severity web_search

Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
src/galileo/handlers/langchain/utils.py around lines 58-117, in the helper
_extract_gemini_modality_breakdown, fix the Gemini response_metadata parsing to support
the adapter’s nested structure. Right now it only reads
response_metadata['prompt_tokens_details'] and ['candidates_tokens_details'] at the top
level, which leaves prompt_details/candidates_details empty when the real adapter nests
them under response_metadata['usage_metadata']. Refactor by first checking
response_metadata for usage_metadata and extracting
prompt_tokens_details/candidates_tokens_details from there, falling back to the current
top-level keys if the nested block is missing, then keep the rest of the modality token
assignment logic unchanged.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit bee7b89 addressed this comment by teaching _extract_gemini_modality_breakdown to read nested response_metadata['usage_metadata'] in addition to the existing top-level token detail lists. It now falls back to the top-level keys when nested details are absent, so Gemini modality counts are no longer left empty when the adapter uses the nested structure.

Comment thread src/galileo/handlers/langchain/utils.py Outdated
Comment on lines +115 to +117
if image_in is None and audio_in is None and audio_out is None:
return None, None, None
return image_in, audio_in, audio_out

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_extract_gemini_modality_breakdown() currently returns (None, None, None) for detail lists that only contain TEXT, which makes LlmMetrics.to_dict() serialize null where ADK emits 0; should we return zeroes when the detail lists are present but no counted modalities are found?

Severity

Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
src/galileo/handlers/langchain/utils.py around lines 58-117, update the
_extract_gemini_modality_breakdown() logic so it doesn’t return (None, None, None)
solely because AUDIO/IMAGE counts weren’t found. Track whether the modality “detail
lists” were actually present (prompt_tokens_details and/or candidates_tokens_details
from message.response_metadata, and usage_metadata
input_token_details/output_token_details if applicable); if those lists are present but
contain no AUDIO/IMAGE entries (only TEXT or other modalities), return (0, 0, 0) with
the corresponding fields set to 0 instead of None. Keep (None, None, None) only when
neither surface provides any modality breakdown data at all. Ensure this matches the ADK
behavior and preserves null-vs-zero meaning for downstream traces.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit bee7b89 addressed this comment by changing _extract_gemini_modality_breakdown() to track whether modality detail lists exist and return 0 values when they do but contain no AUDIO/IMAGE entries. It now reserves (None, None, None) for the case where no breakdown data is present at all, preserving the null-vs-zero distinction the comment asked for.

Comment on lines +30 to +34
# Per-modality breakdown — None means "counted as text in the flat totals".
# Only populated for providers that return modality-level token counts (e.g. Gemini native).
num_image_input_tokens: None | Unset | int = UNSET
num_audio_input_tokens: None | Unset | int = UNSET
num_audio_output_tokens: None | Unset | int = UNSET

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/galileo/resources/ is regenerated from openapi.yaml, but LlmMetrics in the spec doesn't include the new modality fields, so the next regen will drop them and break src/galileo/logger/logger.py:1247-1255; should we move this into the OpenAPI source/generator input instead of patching the generated file?

Severity

Want Baz to fix this for you? Activate Fixer You can also update your AI coding guidelines based on this comment by apply pr to [branch name]

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
src/galileo/resources/models/llm_metrics.py around lines 30-34 and in the additions to
`LlmMetrics.to_dict()` and `LlmMetrics.from_dict()` (new fields
`num_image_input_tokens`, `num_audio_input_tokens`, `num_audio_output_tokens`), don’t
keep these changes in the generated client boundary. Instead, update the OpenAPI source
schema that generates `LlmMetrics` (likely openapi.yaml) to add these three properties
(type int, nullable/optional using the same conventions as the other token fields) so
regeneration won’t delete them. After updating the OpenAPI schema, remove the manual
edits from the generated llm_metrics.py and regenerate via
scripts/auto-generate-api-client.sh, then run the failing logger tests
(src/galileo/logger/logger.py around 1247-1255) to ensure there are no unexpected-kwarg
errors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will regenerate once API changes are in ...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One clarification on the premise: regeneration won't actually break logger.py:1247-1255. That code constructs galileo_core.schemas.logging.span.LlmMetrics (a pydantic model with model_config extra='allow'), not the edited attrs class in src/galileo/resources/models/llm_metrics.py. I verified the modality kwargs are accepted and serialized via model_dump() purely through extra='allow' — so the write/ingest path does not depend on this generated-file edit at all. The edit only affects the read-side models (LlmSpan, extended_llm_span_record, partial_extended_llm_span_record) that call LlmMetrics.from_dict(). So the regen concern is real for the read path, but the generated-file change is functionally unnecessary for emitting the new fields.

Comment on lines +290 to +294
def test_returns_empty_for_no_response(self, observer: GalileoObserver) -> None:
assert observer._extract_usage_metadata(None) == {}

def test_returns_empty_for_no_usage_metadata(self, observer: GalileoObserver) -> None:
response = MagicMock()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add # Given: .../# When: .../# Then: ... comments to each TestExtractUsageMetadata case so it matches AGENTS.md and the earlier tests in this file?

Severity

Want Baz to fix this for you? Activate Fixer You can also update your AI coding guidelines based on this comment by apply pr to [branch name]

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
galileo-adk/tests/test_observer.py around lines 290-372 within the
`TestExtractUsageMetadata` test class, update each test method (e.g.,
`test_returns_empty_for_no_response`, `test_returns_empty_for_no_usage_metadata`,
`test_flat_tokens_no_modality`, `test_audio_and_image_modality_breakdown`,
`test_no_audio_in_candidates_returns_zero_audio_out`,
`test_modality_as_string_instead_of_enum`) to follow AGENTS.md’s canonical pytest
style. Add non-empty sentence-case `# Given: ...`, `# When: ...`, and `# Then: ...`
comments for the setup, the call to `observer._extract_usage_metadata(...)`, and the
assertions. Make the comments consistent with the existing earlier tests in this file,
and ensure every test in this new block includes them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit bee7b89 addressed this comment by adding # Given:, # When:, and # Then: comments to each TestExtractUsageMetadata test in the new block. The updated tests now match the canonical pytest style used earlier in the file.

Comment on lines +229 to +233
if part_type == "text":
text = part.get("text") if isinstance(part, dict) else getattr(part, "text", "")
blocks.append(TextContentBlock(text=text or "", index=idx))
elif part_type == "image_url":
image_url = part.get("image_url") if isinstance(part, dict) else getattr(part, "image_url", {})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch re-implements the type→content block conversion already in galileo/utils/serialization._convert_langchain_content_block, so should we reuse that helper and only materialize TextContentBlock/DataContentBlock here?

Severity

Want Baz to fix this for you? Activate Fixer

Comment on lines +327 to +332
# When content is a list of OpenAI content parts (multimodal), convert to
# DataContentBlock/TextContentBlock so the UI renders the actual media.
if isinstance(content, list):
blocks = _openai_content_parts_to_blocks(content)
if blocks is not None:
return LoggedMessage(content=blocks, role=MessageRole(role))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content is list returns LoggedMessage(content=blocks, role=...) without tool_calls/tool_call_id, so mixed array-content tool messages lose linkage compared to the Message(...) path — should we forward those fields here too?

Severity web_search

Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
src/galileo/openai/extractors.py around lines 327-332 inside convert_to_galileo_message,
the branch handling multimodal content lists (`if isinstance(content, list)`) returns
`LoggedMessage(content=blocks, role=MessageRole(role))` but drops `galileo_tool_calls`
and `tool_call_id` that were computed just above. Refactor this return so the
LoggedMessage includes the same tool linkage as the normal Message path by forwarding
`tool_calls=galileo_tool_calls` and `tool_call_id=tool_call_id` (or construct a Message
with those fields and then wrap it as a LoggedMessage if LoggedMessage doesn’t accept
them directly). Ensure the role value remains consistent with the existing
`MessageRole(role)` conversion.

@fercor-cisco

Copy link
Copy Markdown
Contributor

/astra review

@galileo-astra galileo-astra Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This review was generated by an AI agent (Astra) and may contain mistakes. Please verify all suggestions independently.

Verdict: request_changes — The core per-modality token threading is sound, but the bundled OpenAI multimodal-message path introduces a tool-linkage regression and ships with effectively no test coverage.

General Comments

  • 🟡 minor (design): The new multimodal path makes convert_to_galileo_message return a LoggedMessage whose content is a list of TextContentBlock/DataContentBlock. These messages are then placed into LlmSpan.input/output, whose declared type (galileo_core Message.content) is str | list[ContentPart]. When the span is serialized via model_dump(), pydantic emits a PydanticSerializationUnexpectedValue warning ("Expected str") for every multimodal message. I verified the block data does survive serialization, so this is not data loss, but it means a UserWarning is logged on every multimodal OpenAI call — noisy for a telemetry path that is meant to observe without interfering. Consider whether galileo_core needs a content-block-aware span type, or suppress/handle the warning at the boundary.

Follow-ups

Suggested follow-up work that could be tracked as Shortcut stories:

  • galileo-adk/src/galileo_adk/observer.py:503-557: _extract_usage_metadata (ADK) and _extract_gemini_modality_breakdown (LangChain utils) implement the same Gemini per-modality walk with subtly different null-vs-zero semantics: ADK only sets image_input_tokens/audio_input_tokens when prompt_tokens_details is present and audio_output_tokens when candidates_tokens_details is present (otherwise the key is absent → None downstream), while the LangChain path always returns a 0-filled 3-tuple once any detail data exists. Consider extracting a shared helper so both paths agree on the null/zero contract (matches reviewer comment r3337251851).
  • src/galileo/openai/extractors.py:221-267: _openai_content_parts_to_blocks re-implements type→content-block conversion that already exists in galileo/utils/serialization._convert_langchain_content_block. Consider reusing/sharing that helper to avoid drift (matches reviewer comment r3373853343).

Comment on lines +329 to +332
if isinstance(content, list):
blocks = _openai_content_parts_to_blocks(content)
if blocks is not None:
return LoggedMessage(content=blocks, role=MessageRole(role))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 major (bug): This branch returns a LoggedMessage built only from content and role, discarding the galileo_tool_calls and tool_call_id that were just computed above (lines 302-325). For a tool-role message with array (multimodal) content, or an assistant message that carries both array content and tool calls, the tool-call linkage is silently dropped — whereas the non-list Message(...) path below preserves it. I confirmed this: convert_to_galileo_message({'role':'tool','tool_call_id':'call_123','content':[{'type':'text','text':'result'}]}) returns a LoggedMessage with tool_call_id=None. LoggedMessage inherits these fields from Message, so forward them.

Suggested change
if isinstance(content, list):
blocks = _openai_content_parts_to_blocks(content)
if blocks is not None:
return LoggedMessage(content=blocks, role=MessageRole(role))
if isinstance(content, list):
blocks = _openai_content_parts_to_blocks(content)
if blocks is not None:
return LoggedMessage(
content=blocks,
role=MessageRole(role),
tool_calls=galileo_tool_calls,
tool_call_id=tool_call_id,
)

Comment on lines +105 to +119
# Surface 1: usage_metadata.input_token_details / output_token_details
usage_meta = getattr(msg, "usage_metadata", None)
if isinstance(usage_meta, dict):
input_details = usage_meta.get("input_token_details") or {}
output_details = usage_meta.get("output_token_details") or {}
if isinstance(input_details, dict) and input_details:
has_detail_data = True
if "audio" in input_details:
audio_in = input_details["audio"]
if "image" in input_details:
image_in = input_details["image"]
if isinstance(output_details, dict) and output_details:
has_detail_data = True
if "audio" in output_details:
audio_out = output_details["audio"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 minor (question): Surface 1 fires for any provider whose AIMessage.usage_metadata populates input_token_details/output_token_details — not just Gemini. Anthropic (cache_read/cache_creation) and OpenAI (cache_read/audio/reasoning) via LangChain both populate these. Because a non-empty input_details sets has_detail_data=True even when it contains no audio/image key, the final return coerces missing modalities to 0. Net effect: every Anthropic/OpenAI text call through the LangChain handler now emits image_input_tokens=0 and audio_input_tokens=0 (and, when only output_token_details is present, input fields are still forced to 0). Is that intended? It contradicts the PR's stated "OpenAI-compat path drops breakdown per RFC", and turns the previously-absent fields into 0 for all providers. If only Gemini should be in scope, gate on the provider; otherwise the gemini-specific naming/docstring is misleading.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 major (testing): The new _openai_content_parts_to_blocks helper and the LoggedMessage multimodal branch in convert_to_galileo_message have no unit tests (codecov reports ~15% patch coverage on this file). This is new, non-trivial logic with several easy-to-get-wrong paths: data-URI parsing with a fallback that base64-encodes the entire URI on parse failure, dict-vs-object access for each part type, the None-on-unrecognized-part fallback to str(content), and the tool-linkage issue flagged separately. Please add tests covering: text/image_url(data URI + plain URL)/input_audio parts, an unrecognized part type (falls back to Message), and a tool/assistant message with array content (tool linkage preserved).

@fercor-cisco fercor-cisco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the Astra feedback and test failures.

status_code=status_code,
image_input_tokens=usage.get("image_input_tokens"),
audio_input_tokens=usage.get("audio_input_tokens"),
audio_output_tokens=usage.get("audio_output_tokens"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_output_tokens missing here and many other spots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants