Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "opencode-windsurf-auth",
"version": "0.3.3",
"version": "0.3.4",
"description": "OpenCode plugin for Windsurf/Codeium authentication - use Windsurf models in OpenCode",
"repository": {
"type": "git",
Expand Down
36 changes: 22 additions & 14 deletions src/cloud-direct/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,21 @@ import { getCachedUserJwt } from './auth.js';
import { getCachedCatalog, ModelNotAvailableError } from './catalog.js';

/**
* Connect-RPC streaming inactivity timeout. If the cloud sends zero bytes
* for this long after the last chunk, we abort the fetch. The cloud's own
* idle limit is around 90s on most models; we set ours a little above so
* we only trigger when the server has genuinely stopped responding.
* Connect-RPC streaming inactivity timeout. Opus can spend multiple minutes
* before its first body chunk on large hardware-debugging contexts, so keep
* this above ordinary model thinking latency and let users override it.
*/
const CLOUD_STREAM_IDLE_MS = 120_000;
const CLOUD_STREAM_IDLE_MS = readPositiveIntEnv('OPENCODE_WINDSURF_CLOUD_STREAM_IDLE_MS', 300_000);
/** Time-to-first-byte timeout. */
const CLOUD_STREAM_TTFB_MS = 60_000;
const CLOUD_STREAM_TTFB_MS = readPositiveIntEnv('OPENCODE_WINDSURF_CLOUD_STREAM_TTFB_MS', 120_000);
const DEFAULT_MAX_INPUT_TOKENS = readPositiveIntEnv('OPENCODE_WINDSURF_MAX_INPUT_TOKENS', 256_000);

function readPositiveIntEnv(name: string, fallback: number): number {
const raw = process.env[name];
if (!raw) return fallback;
const parsed = Number(raw);
return Number.isFinite(parsed) && parsed > 0 ? Math.trunc(parsed) : fallback;
}

/**
* Compose multiple AbortSignals into a single signal that aborts when ANY
Expand Down Expand Up @@ -274,7 +281,7 @@ function encodeCompletionConfiguration(opts: {
};
return Buffer.concat([
encodeVarintField(1, 1),
encodeVarintField(2, opts.maxInputTokens ?? 64000),
encodeVarintField(2, opts.maxInputTokens ?? DEFAULT_MAX_INPUT_TOKENS),
// Default to the catalog's most permissive `maxOutputTokens` (128K).
// The cloud clamps to the per-model limit anyway. The old 4096 default
// would silently truncate any callers (tests, CLI users of
Expand Down Expand Up @@ -717,15 +724,16 @@ function decodeUsageBlock(buf: Buffer): CloudChatEvent | null {
}
}
if (promptTokens === undefined && completionTokens === undefined) return null;
// totalTokens reflects what OpenAI's API counts as billable: input +
// output. Cached / cache-creation / reasoning subtotals are surfaced as
// additional fields so callers that want a fuller picture (e.g. cost
// breakdown for reasoning models) can read them, but they're NOT
// double-counted into total.
const total = (promptTokens ?? 0) + (completionTokens ?? 0);
// Cognition reports cache reads/writes separately from fresh input tokens.
// OpenAI-compatible callers expect `prompt_tokens` to represent the full
// effective prompt size (including cached prompt), and opencode uses it for
// context-window display. Preserve the cache subtotals too for callers that
// want cost details.
const fullPromptTokens = (promptTokens ?? 0) + (cachedInputTokens ?? 0) + (cacheCreationInputTokens ?? 0);
const total = fullPromptTokens + (completionTokens ?? 0);
return {
kind: 'usage',
promptTokens,
promptTokens: fullPromptTokens > 0 ? fullPromptTokens : undefined,
completionTokens,
Comment on lines 726 to 737

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Broaden the usage-presence guard to include cache-derived metrics.

Line 726 still returns null unless input_tokens or output_tokens is present. That drops usage events when only cache metrics are reported, even though Line 732 now treats cache tokens as prompt-equivalent.

Suggested fix
-  if (promptTokens === undefined && completionTokens === undefined) return null;
+  if (
+    promptTokens === undefined &&
+    completionTokens === undefined &&
+    cachedInputTokens === undefined &&
+    cacheCreationInputTokens === undefined &&
+    reasoningTokens === undefined
+  ) return null;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (promptTokens === undefined && completionTokens === undefined) return null;
// totalTokens reflects what OpenAI's API counts as billable: input +
// output. Cached / cache-creation / reasoning subtotals are surfaced as
// additional fields so callers that want a fuller picture (e.g. cost
// breakdown for reasoning models) can read them, but they're NOT
// double-counted into total.
const total = (promptTokens ?? 0) + (completionTokens ?? 0);
// Cognition reports cache reads/writes separately from fresh input tokens.
// OpenAI-compatible callers expect `prompt_tokens` to represent the full
// effective prompt size (including cached prompt), and opencode uses it for
// context-window display. Preserve the cache subtotals too for callers that
// want cost details.
const fullPromptTokens = (promptTokens ?? 0) + (cachedInputTokens ?? 0) + (cacheCreationInputTokens ?? 0);
const total = fullPromptTokens + (completionTokens ?? 0);
return {
kind: 'usage',
promptTokens,
promptTokens: fullPromptTokens > 0 ? fullPromptTokens : undefined,
completionTokens,
if (
promptTokens === undefined &&
completionTokens === undefined &&
cachedInputTokens === undefined &&
cacheCreationInputTokens === undefined &&
reasoningTokens === undefined
) return null;
// Cognition reports cache reads/writes separately from fresh input tokens.
// OpenAI-compatible callers expect `prompt_tokens` to represent the full
// effective prompt size (including cached prompt), and opencode uses it for
// context-window display. Preserve the cache subtotals too for callers that
// want cost details.
const fullPromptTokens = (promptTokens ?? 0) + (cachedInputTokens ?? 0) + (cacheCreationInputTokens ?? 0);
const total = fullPromptTokens + (completionTokens ?? 0);
return {
kind: 'usage',
promptTokens: fullPromptTokens > 0 ? fullPromptTokens : undefined,
completionTokens,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloud-direct/chat.ts` around lines 726 - 737, The usage guard currently
returns null if only promptTokens and completionTokens are undefined; update the
condition so it also considers cachedInputTokens and cacheCreationInputTokens
(i.e., return null only when all four of promptTokens, completionTokens,
cachedInputTokens, and cacheCreationInputTokens are undefined or zero), so that
cache-only metrics still produce a usage object—adjust the check around the
existing if (promptTokens === undefined && completionTokens === undefined)
return null; to include cachedInputTokens and cacheCreationInputTokens and
ensure fullPromptTokens logic remains unchanged.

totalTokens: total > 0 ? total : undefined,
cachedInputTokens,
Expand Down
Loading