Skip to content

feat(copilot): add token usage tracking to Copilot CLI parser#577

Open
MCBoarder289 wants to merge 5 commits into
kenn-io:mainfrom
MCBoarder289:copilot-token-usage
Open

feat(copilot): add token usage tracking to Copilot CLI parser#577
MCBoarder289 wants to merge 5 commits into
kenn-io:mainfrom
MCBoarder289:copilot-token-usage

Conversation

@MCBoarder289
Copy link
Copy Markdown
Contributor

The Copilot CLI parser now extracts and surfaces token usage data, allowing Copilot sessions to appear correctly on the usage dashboard alongside other agents.

Per-message output tokens are read from the outputTokens field on each assistant.message event and wired into ParsedMessage.OutputTokens/HasOutputTokens. accumulateMessageTokenUsage is called at parse time so session-level TotalOutputTokens is populated.

Session-level input and cache token totals are derived from session.shutdown events, which carry a modelMetrics map with per-model accounting (inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens). Each model entry is emitted as a ParsedUsageEvent; fresh input tokens are calculated as inputTokens − cacheReadTokens − cacheWriteTokens. Sessions with context compaction emit multiple shutdown events (one per segment), each with a positional dedup key to prevent collisions on re-sync.

Model name normalization converts dot-form version numbers from Copilot events (e.g. claude-sonnet-4.6) to hyphen-form (claude-sonnet-4-6) to match the pricing catalog's key format, ensuring cost calculations resolve correctly.

dataVersion is bumped to 31 to trigger a full re-sync and replace any previously persisted collision-based rows with correct per-segment records.

MCBoarder289 and others added 4 commits June 1, 2026 11:37
Each assistant.message event in the Copilot JSONL format contains an
outputTokens field. Wire it up to ParsedMessage.OutputTokens /
HasOutputTokens and call accumulateMessageTokenUsage so the session-level
TotalOutputTokens and HasTotalOutputTokens are populated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The session.shutdown event in Copilot's events.jsonl contains a
modelMetrics field with full per-model token accounting:
inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens,
and reasoningTokens.

- Add copilotEventSessionShutdown constant and handleShutdown()
  method on the builder that emits one ParsedUsageEvent per model
- Derive fresh input tokens as inputTokens - cacheReadTokens -
  cacheWriteTokens (the raw total includes cached tokens)
- Skip fully-zero model entries
- Change ParseCopilotSession to return []ParsedUsageEvent as a
  fourth return value; stamp SessionID and DedupKey after the
  qualified session ID is known
- Wire UsageEvents into the sync engine ParseResult
- Add four new tests covering the happy path, multi-model,
  zero-usage skipping, and the no-shutdown case

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot events use dots in model version numbers (e.g. claude-sonnet-4.6)
while the pricing catalog uses hyphens (claude-sonnet-4-6). The pricing
lookup is an exact map match, so dot-form names produced no cost results.

Add normalizeCopilotModel() helper that replaces dots with hyphens and
apply it in both handleModelChange (per-message Model field) and
handleShutdown (ParsedUsageEvent.Model). Update tests to assert
hyphen-form model names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sessions with context compaction emit multiple session.shutdown events
(one per segment). The previous key format 'shutdown:<session>:<model>'
caused all segments for the same model to collide on the unique index,
keeping only the first segment and silently dropping the rest.

Fix: include the event's ordinal position in b.usageEvents as a
discriminator: 'shutdown:<session>:<model>:<N>'. Position is stable
across re-syncs because events are always appended in file order.

Also add TestParseCopilotSession_MultiShutdown_SameModel as a
regression test confirming both segments are captured with distinct
keys.

Bump dataVersion to 31 to trigger a full re-sync so existing
collision-based rows are replaced with the correct per-segment rows.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 1, 2026

roborev: Combined Review (a7d08c9)

Medium risk: one pricing regression found in Copilot model normalization.

Medium

  • internal/parser/copilot.go:299
    normalizeCopilotModel replaces every dot in every Copilot model name, so catalog-backed OpenAI-style models like gpt-5.4 / gpt-5.5 would be stored as gpt-5-4 / gpt-5-5, causing exact pricing lookup misses. The comment says only digit-surrounded dots are replaced, but the implementation is broader.
    Fix: Restrict normalization to Copilot/Claude IDs that actually need it, or preserve model names already matching the pricing catalog. Add a regression test for a Copilot gpt-5.4 model.

Synthesized from 2 reviews (agents: codex | types: default, security)

GPT model IDs use dots in the pricing catalog (e.g. gpt-5.4) so
applying strings.ReplaceAll universally would convert gpt-5.4 to
gpt-5-4 and cause pricing lookup misses.

Restrict the dot-to-hyphen substitution in normalizeCopilotModel to
names that begin with 'claude-', which are the only Copilot-emitted
IDs that need normalization. All other model names pass through
unchanged.

Add TestNormalizeCopilotModel table test covering claude variants,
gpt-5.4/gpt-5.5, gpt-4o, o3-mini, and the empty string.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 1, 2026

roborev: Fail

Review findings

No verified Medium, High, or Critical findings.

Both review outputs reported no issues, and I found no findings to deduplicate or include. Focused validation could not be run because go is not installed in this environment (go: command not found).


Review type: | Agent: codex | Job: 19192

@MCBoarder289
Copy link
Copy Markdown
Contributor Author

Looks like this should be good to go! Not sure roborev's no findings is a "fail" unless that needs to re-run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant