Skip to content

bug: Turn/Session cost is exactly 2× the actual per-call cost — mathematical doubling in cost accumulation #292

@kenotron

Description

@kenotron

Repo: amplifier-module-provider-anthropic + amplifier-module-hooks-streaming-ui

Evidence (~/Downloads/image (1).png)

Simple session — amplifier-dev bundle, claude-opus-4-7, NO tools, NO delegation:

📊 Token Usage (anthropic/claude-opus-4-7) [3.2s]
 Input: 92,650 (caching...) | Output: 15 | Total: 92,665 | Cost: $0.58
💰 Turn: $1.16 | Session: $1.16

📊 Token Usage (anthropic/claude-opus-4-7) [3.7s]
 Input: 92,683 (94% cached) | Output: 110 | Total: 92,793 | Cost: $0.08
💰 Turn: $0.15 | Session: $1.31
Turn Per-call Cost: 💰 Turn: Ratio
"hi" $0.58 $1.16 exactly 2.00×
"tip of the day" $0.08 $0.15 ≈ 2× (actual ~$0.076)

This is mathematical doubling, not visual confusion

  • Cost: $0.58 comes from usage.cost_usd in the content_block:end event — stamped once by compute_cost() inside _convert_to_chat_response()
  • Turn: $1.16 comes from collect_contributions("session.cost") which reads _totals["cost_usd"]
  • For _totals["cost_usd"] to be $1.16 when one API call cost $0.58, _add_cost($0.58) must have been called twice

What we know from code inspection

_add_cost is called exactly once in _convert_to_chat_response() (line 3395 in amplifier_module_provider_anthropic/__init__.py). _convert_to_chat_response is called once in complete() (line 2669). So either:

  1. complete() is being called twice by the orchestrator — possible if the _fallback_on_overload path retries after a partial failure, or if the streaming orchestrator makes two calls (stream-to-display then parse-for-tools). This is the most likely cause.

  2. A second session.cost contributor is registered with the same value — e.g., if mount() is called twice and both contributors somehow reflect the same _totals. The Rust coordinator APPENDS contributors (confirmed: coordinator.rs: .push(entry)), so two registrations would both be collected. No second session.cost contributor was found in code inspection.

Why Cost: shows half the Turn

Cost: in the per-call line shows chat_response.usage.cost_usd — the cost of the final call only. Turn: shows the accumulated _totals["cost_usd"] across all calls in the turn. If the orchestrator makes 2 calls per user turn, _totals accumulates both while the display only shows the last.

Investigation needed

Add instrumentation to confirm:

def _add_cost(cost) -> None:
    import traceback
    logger.warning("_add_cost called: cost=%s, stack=\n%s", cost, ''.join(traceback.format_stack()))
    if cost is not None:
        _totals["cost_usd"] = (_totals["cost_usd"] or Decimal("0")) + cost
        _totals["has_data"] = True

This will reveal whether _add_cost is called once or twice per user turn, and from which code path.

Impact

Every Turn and Session cost shown to users is 2× the actual API cost. The Session total accumulates this doubling, so long sessions show dramatically inflated costs.

Note on previous display fixes

PR/fix on feat/m0-cost-management removed Cost: from the per-call token line (issue #291). That hides the symptom but does NOT fix the Turn: doubling — users will still see doubled Turn/Session costs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions