Skip to content

bug: set_usage is silently additive when called mid-stream #52

@shloimy-wiesel

Description

@shloimy-wiesel

Summary

set_usage(answer_tokens=N) is documented as "override auto-counted token values with exact LLM-reported counts" (context.py:594), but subsequent write_text / write_reasoning calls continue to add to the field, producing a final value that is neither the override nor the auto-counted total.

Repro

import asyncio
from ai_sdk_stream_python import StreamContext

async def work(ctx):
    await ctx.write_text("hello")          # auto-counts to 5
    await ctx.set_usage(answer_tokens=3)   # override to 3
    await ctx.write_text(" world")         # adds 6 → final 9
    await ctx.finish()

async def main():
    ctx = StreamContext(collect=True)
    asyncio.create_task(work(ctx))
    async for _ in ctx.stream(): pass
    print(ctx.record.answer_tokens)        # 9 — neither the override (3) nor full total (11)

asyncio.run(main())

User intent could be:

  • "Final count is exactly 3" (treat set_usage as authoritative final)
  • "Final count is 11" (treat set_usage as initial seed before continued counting)

The current behavior (9) matches neither — it's a silent miscount.

Proposed fix

Two reasonable options:

Option 1 — set_usage becomes authoritative: after set_usage(answer_tokens=N), stop auto-counting answer tokens for the rest of the stream (track a per-field "locked" flag). Document that set_usage should be called once at the end, matching the typical pattern of reading usage from the LLM's final chunk.

Option 2 — set_usage is advisory: require callers to call it only once at the end (on_finish or just before finish()); raise if called when there are still writes pending. More restrictive but matches the documented "exact value" semantics.

I lean Option 1 — it keeps the convenient mid-stream availability of set_usage while making the semantics predictable.

Workflow (required for fix)

Per repo convention:

  1. Add failing tests that cover the chosen semantics (mid-stream call + later writes, idempotent re-calls).
  2. Apply the fix (Option 1 = three "locked" flags + guard around the per-delta increment).
  3. Verify the new tests pass and existing token-counting tests (test_set_usage_*) still pass.
  4. Update the docstring on set_usage to spell out the contract.

Acceptance

  • Decision recorded in this issue on Option 1 vs Option 2.
  • Failing repro test added.
  • Docstring on set_usage clearly states the contract.
  • uv run pytest tests/ -q and uv run pyright src stay green.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions