Skip to content

chore(weave): Add db and server support for cached tokens#6507

Open
andrewtruong wants to merge 13 commits intomasterfrom
andrew/cached-token-support
Open

chore(weave): Add db and server support for cached tokens#6507
andrewtruong wants to merge 13 commits intomasterfrom
andrew/cached-token-support

Conversation

@andrewtruong
Copy link
Copy Markdown
Collaborator

@andrewtruong andrewtruong commented Mar 30, 2026

https://coreweave.atlassian.net/browse/WB-32599

Relevant wiring to add cached token support to the backend


Open with Devin

@wandbot-3000
Copy link
Copy Markdown

wandbot-3000 bot commented Mar 30, 2026

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 77.77778% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ve/trace_server/clickhouse_trace_server_batched.py 0.00% 8 Missing ⚠️
..._server/calls_query_builder/usage_query_builder.py 0.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@w-b-hivemind
Copy link
Copy Markdown

w-b-hivemind bot commented Mar 31, 2026

HiveMind Sessions

4 sessions · 25h 44m · $29

Session Agent Duration Tokens Cost Lines
Extract a concise 3-8 word title that captures
097bd183-ceb4-45dc-be84-03a2ce5ace65
claude 10h 11m 34.3K $8.35 +58 -15
Conductor Workspace Setup
9cb7633e-d313-4a41-b28d-858f2b7b1150
claude 1m 3.4K $0.47 +11 -1
Conductor Workspace Setup
ab32679e-58dc-4907-bb2a-3b6c3ec24788
claude 15h 31m 108.4K $20 +762 -245
Conductor Workspace Setup
d568acd1-86fe-4588-92f4-4b7ea9c1c10c
claude 26s 184 $0.12 +0 -0
Total 25h 44m 146.3K $29 +831 -261

View all sessions in HiveMind →

Run claude --resume 097bd183-ceb4-45dc-be84-03a2ce5ace65 to pickup where you left off.

@andrewtruong andrewtruong changed the title chore(weave): Update schema to support cached tokens chore(weave): Add db and server support for cached tokens Mar 31, 2026
@andrewtruong andrewtruong marked this pull request as ready for review April 1, 2026 16:58
@andrewtruong andrewtruong requested review from a team, gtarpenning and tssweeney as code owners April 1, 2026 16:58
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 filter_out_current_costs ignores cache cost fields, causing updated cache pricing to never be seeded

The filter_out_current_costs function at weave/trace_server/costs/insert_costs.py:116-150 determines whether a cost entry already exists in the DB by comparing only prompt_token_cost (mapped from cost["input"]), completion_token_cost (mapped from cost["output"]), and effective_date. It does not compare the new cache_read_input or cache_creation_input fields. Similarly, get_current_costs at weave/trace_server/costs/insert_costs.py:22-39 only queries llm_id, prompt_token_cost, completion_token_cost, effective_date — it doesn't fetch cache cost columns at all.

This means if cost_checkpoint.json is updated to add cache pricing for a model that already has matching prompt/completion costs and effective_date in the database, the entry will be incorrectly filtered out as a duplicate, and the new cache costs will never be inserted.

(Refers to lines 130-143)

Prompt for agents
In weave/trace_server/costs/insert_costs.py, update get_current_costs (lines 22-39) to also SELECT cache_read_input_token_cost and cache_creation_input_token_cost from llm_token_prices. Then update filter_out_current_costs (lines 116-150) to unpack those additional columns from the current_costs tuples and include them in the comparison at lines 132-135. Add two additional math.isclose checks: one comparing cache_read_input_token_cost with cost.get('cache_read_input', 0) and another comparing cache_creation_input_token_cost with cost.get('cache_creation_input', 0).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines 3067 to 3068
Cost metrics are computed post-query by multiplying token counts by prices from llm_token_prices:
- input_cost: input_tokens * prompt_token_cost
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 total_cost docstring not updated to reflect inclusion of cache costs

The UsageMetric docstring at weave/trace_server/trace_server_interface.py:3063-3071 states total_cost: input_cost + output_cost, but the actual implementation in _compute_costs_for_buckets (weave/trace_server/clickhouse_trace_server_batched.py:1137-1146) now computes total_cost = input_cost + output_cost + cache_read_total + cache_creation_total. Users relying on the documented formula will have incorrect expectations about what total_cost includes.

(Refers to lines 3067-3071)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 10 additional findings in Devin Review.

Open in Devin Review

'"prompt_tokens_total_cost":', toString(prompt_tokens * prompt_token_cost), ',',
'"cache_read_input_token_cost":', toString(cache_read_input_token_cost), ',',
'"cache_creation_input_token_cost":', toString(cache_creation_input_token_cost), ',',
'"prompt_tokens_total_cost":', toString((prompt_tokens - cache_read_input_tokens) * prompt_token_cost), ',',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 prompt_tokens_total_cost double-charges cache_creation_input_tokens in ClickHouse SQL

The prompt_tokens_total_cost formula subtracts cache_read_input_tokens from prompt_tokens but does not subtract cache_creation_input_tokens. Since providers like Anthropic include both cache-read and cache-creation tokens in the total input_tokens count, cache_creation_input_tokens are double-charged: once at the regular prompt rate (included in prompt_tokens_total_cost) and again at the cache-creation rate (in cache_creation_input_tokens_total_cost). The comment in the SQLite path (sqlite_trace_server.py:875-876) confirms the intent: "Subtract cached tokens: they are billed at the cache rate, not the regular input rate" — but only one of the two cache token types is subtracted.

Suggested change
'"prompt_tokens_total_cost":', toString((prompt_tokens - cache_read_input_tokens) * prompt_token_cost), ',',
'"prompt_tokens_total_cost":', toString((prompt_tokens - cache_read_input_tokens - cache_creation_input_tokens) * prompt_token_cost), ',',
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +877 to +880
"prompt_tokens_total_cost": (
prompt_tokens - cache_read_input_tokens
)
* prompt_cost,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 prompt_tokens_total_cost double-charges cache_creation_input_tokens in SQLite path

Same issue as the ClickHouse SQL path: the SQLite cost calculation at sqlite_trace_server.py:877-880 computes prompt_tokens_total_cost as (prompt_tokens - cache_read_input_tokens) * prompt_cost, but fails to also subtract cache_creation_input_tokens. This causes cache-creation tokens to be billed at both the regular prompt rate and the cache-creation rate.

Suggested change
"prompt_tokens_total_cost": (
prompt_tokens - cache_read_input_tokens
)
* prompt_cost,
"prompt_tokens_total_cost": (
prompt_tokens
- cache_read_input_tokens
- cache_creation_input_tokens
)
* prompt_cost,
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant