feat(copilot): add token usage tracking to Copilot CLI parser#577
Open
MCBoarder289 wants to merge 5 commits into
Open
feat(copilot): add token usage tracking to Copilot CLI parser#577MCBoarder289 wants to merge 5 commits into
MCBoarder289 wants to merge 5 commits into
Conversation
Each assistant.message event in the Copilot JSONL format contains an outputTokens field. Wire it up to ParsedMessage.OutputTokens / HasOutputTokens and call accumulateMessageTokenUsage so the session-level TotalOutputTokens and HasTotalOutputTokens are populated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The session.shutdown event in Copilot's events.jsonl contains a modelMetrics field with full per-model token accounting: inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens, and reasoningTokens. - Add copilotEventSessionShutdown constant and handleShutdown() method on the builder that emits one ParsedUsageEvent per model - Derive fresh input tokens as inputTokens - cacheReadTokens - cacheWriteTokens (the raw total includes cached tokens) - Skip fully-zero model entries - Change ParseCopilotSession to return []ParsedUsageEvent as a fourth return value; stamp SessionID and DedupKey after the qualified session ID is known - Wire UsageEvents into the sync engine ParseResult - Add four new tests covering the happy path, multi-model, zero-usage skipping, and the no-shutdown case Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot events use dots in model version numbers (e.g. claude-sonnet-4.6) while the pricing catalog uses hyphens (claude-sonnet-4-6). The pricing lookup is an exact map match, so dot-form names produced no cost results. Add normalizeCopilotModel() helper that replaces dots with hyphens and apply it in both handleModelChange (per-message Model field) and handleShutdown (ParsedUsageEvent.Model). Update tests to assert hyphen-form model names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sessions with context compaction emit multiple session.shutdown events (one per segment). The previous key format 'shutdown:<session>:<model>' caused all segments for the same model to collide on the unique index, keeping only the first segment and silently dropping the rest. Fix: include the event's ordinal position in b.usageEvents as a discriminator: 'shutdown:<session>:<model>:<N>'. Position is stable across re-syncs because events are always appended in file order. Also add TestParseCopilotSession_MultiShutdown_SameModel as a regression test confirming both segments are captured with distinct keys. Bump dataVersion to 31 to trigger a full re-sync so existing collision-based rows are replaced with the correct per-segment rows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
roborev: Combined Review (
|
GPT model IDs use dots in the pricing catalog (e.g. gpt-5.4) so applying strings.ReplaceAll universally would convert gpt-5.4 to gpt-5-4 and cause pricing lookup misses. Restrict the dot-to-hyphen substitution in normalizeCopilotModel to names that begin with 'claude-', which are the only Copilot-emitted IDs that need normalization. All other model names pass through unchanged. Add TestNormalizeCopilotModel table test covering claude variants, gpt-5.4/gpt-5.5, gpt-4o, o3-mini, and the empty string. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
roborev: FailReview findingsNo verified Medium, High, or Critical findings. Both review outputs reported no issues, and I found no findings to deduplicate or include. Focused validation could not be run because Review type: | Agent: codex | Job: 19192 |
Contributor
Author
|
Looks like this should be good to go! Not sure roborev's no findings is a "fail" unless that needs to re-run? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Copilot CLI parser now extracts and surfaces token usage data, allowing Copilot sessions to appear correctly on the usage dashboard alongside other agents.
Per-message output tokens are read from the outputTokens field on each assistant.message event and wired into ParsedMessage.OutputTokens/HasOutputTokens. accumulateMessageTokenUsage is called at parse time so session-level TotalOutputTokens is populated.
Session-level input and cache token totals are derived from session.shutdown events, which carry a modelMetrics map with per-model accounting (inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens). Each model entry is emitted as a ParsedUsageEvent; fresh input tokens are calculated as inputTokens − cacheReadTokens − cacheWriteTokens. Sessions with context compaction emit multiple shutdown events (one per segment), each with a positional dedup key to prevent collisions on re-sync.
Model name normalization converts dot-form version numbers from Copilot events (e.g. claude-sonnet-4.6) to hyphen-form (claude-sonnet-4-6) to match the pricing catalog's key format, ensuring cost calculations resolve correctly.
dataVersion is bumped to 31 to trigger a full re-sync and replace any previously persisted collision-based rows with correct per-segment records.