Summary
set_usage(answer_tokens=N) is documented as "override auto-counted token values with exact LLM-reported counts" (context.py:594), but subsequent write_text / write_reasoning calls continue to add to the field, producing a final value that is neither the override nor the auto-counted total.
Repro
import asyncio
from ai_sdk_stream_python import StreamContext
async def work(ctx):
await ctx.write_text("hello") # auto-counts to 5
await ctx.set_usage(answer_tokens=3) # override to 3
await ctx.write_text(" world") # adds 6 → final 9
await ctx.finish()
async def main():
ctx = StreamContext(collect=True)
asyncio.create_task(work(ctx))
async for _ in ctx.stream(): pass
print(ctx.record.answer_tokens) # 9 — neither the override (3) nor full total (11)
asyncio.run(main())
User intent could be:
- "Final count is exactly 3" (treat
set_usage as authoritative final)
- "Final count is 11" (treat
set_usage as initial seed before continued counting)
The current behavior (9) matches neither — it's a silent miscount.
Proposed fix
Two reasonable options:
Option 1 — set_usage becomes authoritative: after set_usage(answer_tokens=N), stop auto-counting answer tokens for the rest of the stream (track a per-field "locked" flag). Document that set_usage should be called once at the end, matching the typical pattern of reading usage from the LLM's final chunk.
Option 2 — set_usage is advisory: require callers to call it only once at the end (on_finish or just before finish()); raise if called when there are still writes pending. More restrictive but matches the documented "exact value" semantics.
I lean Option 1 — it keeps the convenient mid-stream availability of set_usage while making the semantics predictable.
Workflow (required for fix)
Per repo convention:
- Add failing tests that cover the chosen semantics (mid-stream call + later writes, idempotent re-calls).
- Apply the fix (Option 1 = three "locked" flags + guard around the per-delta increment).
- Verify the new tests pass and existing token-counting tests (
test_set_usage_*) still pass.
- Update the docstring on
set_usage to spell out the contract.
Acceptance
Summary
set_usage(answer_tokens=N)is documented as "override auto-counted token values with exact LLM-reported counts" (context.py:594), but subsequentwrite_text/write_reasoningcalls continue to add to the field, producing a final value that is neither the override nor the auto-counted total.Repro
User intent could be:
set_usageas authoritative final)set_usageas initial seed before continued counting)The current behavior (9) matches neither — it's a silent miscount.
Proposed fix
Two reasonable options:
Option 1 —
set_usagebecomes authoritative: afterset_usage(answer_tokens=N), stop auto-counting answer tokens for the rest of the stream (track a per-field "locked" flag). Document thatset_usageshould be called once at the end, matching the typical pattern of reading usage from the LLM's final chunk.Option 2 —
set_usageis advisory: require callers to call it only once at the end (on_finishor just beforefinish()); raise if called when there are still writes pending. More restrictive but matches the documented "exact value" semantics.I lean Option 1 — it keeps the convenient mid-stream availability of
set_usagewhile making the semantics predictable.Workflow (required for fix)
Per repo convention:
test_set_usage_*) still pass.set_usageto spell out the contract.Acceptance
set_usageclearly states the contract.uv run pytest tests/ -qanduv run pyright srcstay green.