Claude models on windsurf seems to be text only now (no tool or error). Fix #14#15
Claude models on windsurf seems to be text only now (no tool or error). Fix #14#15R-N wants to merge 7 commits into
Conversation
📝 WalkthroughWalkthroughAdds a ChangesText-only Model Tool Fallback Routing
Sequence DiagramsequenceDiagram
participant Client
participant WindsurfPlugin
participant Resolver
participant OpusPlanner
participant TranslatorModel
participant ToolsRuntime
Client->>WindsurfPlugin: send chat request (with tools)
WindsurfPlugin->>Resolver: resolveModel(requestedModel)
Resolver-->>WindsurfPlugin: ResolvedModel (textOnly = true)
WindsurfPlugin->>OpusPlanner: run planning pass on original model
OpusPlanner-->>WindsurfPlugin: planning output (indicates tool intent)
WindsurfPlugin->>TranslatorModel: streamChatEvents using translator UID (tools preserved)
TranslatorModel-->>WindsurfPlugin: emits tool_call events
WindsurfPlugin->>ToolsRuntime: forward/synthesize tool_call events
ToolsRuntime-->>WindsurfPlugin: tool results
WindsurfPlugin-->>Client: final assembled response (combined usage)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (5)
package.json (1)
3-3: 💤 Low valueConsider a minor version bump for feature additions.
The PR adds new features (text-only model detection, automatic routing, API capability fields) rather than just bug fixes. Semantic versioning typically uses minor version increments (e.g., 0.4.0) for backward-compatible feature additions and patch increments for bug fixes. That said, 0.x versioning conventions are more flexible, so 0.3.4 is acceptable if that matches the project's versioning philosophy.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@package.json` at line 3, The package.json "version" field is still "0.3.4" but this PR introduces new, backward-compatible features, so update the "version" key to a minor bump (e.g., "0.4.0") to reflect feature additions; also update any associated lockfiles or changelog entries and ensure CI/release config that reads the version (e.g., any scripts referencing package.json version) are consistent with the new value.src/plugin.ts (3)
123-150: ⚡ Quick winConsider handling unknown fallback models gracefully.
If
fallbackOverrideor the env var contains an invalid model name,resolveModel(fallbackName)at line 136 will throw "Unknown Windsurf model" before the code can checkfallback.textOnly. The user would see a confusing error about an unknown model rather than "fallback is invalid" guidance.Consider wrapping line 136 in a try-catch and re-throwing with context, or validating the fallback name before calling
resolveModel.🛡️ Suggested improvement
const fallbackName = fallbackOverride?.trim() || process.env.OPENCODE_WINDSURF_TEXT_ONLY_TOOL_FALLBACK_MODEL?.trim() || DEFAULT_TEXT_ONLY_TOOL_FALLBACK_MODEL; - const fallback = resolveModel(fallbackName); + let fallback; + try { + fallback = resolveModel(fallbackName); + } catch (err) { + throw new Error( + `Model "${resolved.modelId}" cannot use tools and the specified fallback ` + + `"${fallbackName}" is not a valid model. ${err instanceof Error ? err.message : String(err)}` + ); + } if (fallback.textOnly) { throw new Error(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugin.ts` around lines 123 - 150, The code calls resolveModel(fallbackName) inside routeToolsForModel and will throw an opaque "Unknown Windsurf model" if the fallback name is invalid; wrap the resolveModel(fallbackName) call in a try-catch (or otherwise validate fallbackName) and rethrow a clearer error that includes context (e.g., mention fallbackOverride/env var and that the fallback model is invalid) before checking fallback.textOnly; reference the symbols routeToolsForModel, fallbackName, resolveModel, and fallback when implementing the catch/re-throw so the error message points to the failing fallback selection rather than leaking the low-level unknown-model message.
285-285: ⚡ Quick winConsider surfacing the fallback routing to the user.
When a text-only model is routed to a fallback, the response chunks (lines 329, 376, 405) still report
model: requestedModel, hiding the fact that a different model actually processed the request. Users inspecting token usage or model-specific behavior might be confused.The debug log at line 285 records the routing, but is only visible when
WINDSURF_PLUGIN_DEBUG=1.Consider one of:
- Emit a warning chunk before the first assistant chunk noting the routing
- Set
response.modeltorouted.fallbackModelIdwhen routing occurred- Add a custom
x-routed-fromfield to the responseThis improves transparency without breaking the streaming flow.
Also applies to: 309-309, 311-311
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugin.ts` at line 285, The streaming responses currently report the requested model (resolved.modelUid) even when routing to a fallback occurred; update the response creation in streamChatEvents so that if routed.modelUid !== resolved.modelUid you set response.model = routed.fallbackModelId (or routed.modelUid if that is the actual processor id) instead of the requested model, and also add a small meta field (e.g., response["x-routed-from"] = resolved.modelUid) so callers can see the original request; adjust the three places where assistant chunks are assembled (the code paths that currently use resolved.modelUid/requestedModel for response.model) and keep the existing debugLog.log entry.
891-892: 💤 Low valueClarify whether both
capabilities.toolsandtext_onlyare needed.Lines 898-899 expose both
capabilities.tools(true when tools supported) andtext_only(true when tools NOT supported). These are inverses of each other.If only one field is needed for OpenCode's consumption, consider removing the redundant one. If both serve distinct purposes (e.g., different client conventions), a comment explaining why would help future maintainers.
Also applies to: 898-899
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugin.ts` around lines 891 - 892, The code exposes both capabilities.tools and text_only as inverse booleans (set via resolveModel(id) -> resolved and supportsTools = !resolved.textOnly); decide and implement one of two fixes: either remove the redundant field (delete either the capabilities.tools assignment or the text_only exposure and update callers to use the remaining field), or keep both but add a concise comment next to the capabilities.tools and text_only assignments explaining that they intentionally mirror different client conventions (one used by OpenCode, one used elsewhere) so maintainers understand why both exist; locate symbols resolveModel, resolved, supportsTools, capabilities.tools and text_only when making the change.tests/unit/variant.test.ts (1)
50-54: ⚡ Quick winConsider expanding test coverage for
textOnlyflag.The test validates the flag for 2 Claude models (opus-4.7, sonnet-4.6) and 1 non-Claude model (swe-1.6), but the catalog marks ~10 Claude models as text-only across both Cognition-era and legacy sections (claude-4.5, claude-4.6, claude-3.7-sonnet, etc.).
Recommend testing at least one model from each Claude family (4.5, 4.6, 4.7, 3.7, legacy proto-enum models) to ensure the flag propagates correctly through all catalog paths. This would catch drift if future catalog regenerations accidentally drop the flag.
📝 Suggested test expansion
test('marks Claude cloud models as text-only', () => { + // Cognition-era models expect(resolveModel('claude-opus-4.7').textOnly).toBe(true); + expect(resolveModel('claude-opus-4.5').textOnly).toBe(true); expect(resolveModel('claude-sonnet-4.6').textOnly).toBe(true); + expect(resolveModel('claude-sonnet-4.5').textOnly).toBe(true); + // Legacy proto-enum models + expect(resolveModel('claude-3.7-sonnet').textOnly).toBe(true); + expect(resolveModel('claude-4-opus').textOnly).toBe(true); + // Non-Claude models should not have the flag expect(resolveModel('swe-1.6').textOnly).toBeUndefined(); + expect(resolveModel('gpt-5.5').textOnly).toBeUndefined(); + expect(resolveModel('gemini-3.0-flash').textOnly).toBeUndefined(); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/variant.test.ts` around lines 50 - 54, The existing test "marks Claude cloud models as text-only" only checks two Claude variants; expand it to assert that resolveModel(...).textOnly is true for at least one representative from each Claude family (e.g., 'claude-4.5', 'claude-4.6', 'claude-4.7', 'claude-3.7-sonnet' and one legacy/enum-style Claude identifier) so the flag is validated across both Cognition-era and legacy catalog paths, and keep an assertion that a non-Claude model (e.g., 'swe-1.6') remains undefined/false to prevent regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@package.json`:
- Line 3: The package.json "version" field is still "0.3.4" but this PR
introduces new, backward-compatible features, so update the "version" key to a
minor bump (e.g., "0.4.0") to reflect feature additions; also update any
associated lockfiles or changelog entries and ensure CI/release config that
reads the version (e.g., any scripts referencing package.json version) are
consistent with the new value.
In `@src/plugin.ts`:
- Around line 123-150: The code calls resolveModel(fallbackName) inside
routeToolsForModel and will throw an opaque "Unknown Windsurf model" if the
fallback name is invalid; wrap the resolveModel(fallbackName) call in a
try-catch (or otherwise validate fallbackName) and rethrow a clearer error that
includes context (e.g., mention fallbackOverride/env var and that the fallback
model is invalid) before checking fallback.textOnly; reference the symbols
routeToolsForModel, fallbackName, resolveModel, and fallback when implementing
the catch/re-throw so the error message points to the failing fallback selection
rather than leaking the low-level unknown-model message.
- Line 285: The streaming responses currently report the requested model
(resolved.modelUid) even when routing to a fallback occurred; update the
response creation in streamChatEvents so that if routed.modelUid !==
resolved.modelUid you set response.model = routed.fallbackModelId (or
routed.modelUid if that is the actual processor id) instead of the requested
model, and also add a small meta field (e.g., response["x-routed-from"] =
resolved.modelUid) so callers can see the original request; adjust the three
places where assistant chunks are assembled (the code paths that currently use
resolved.modelUid/requestedModel for response.model) and keep the existing
debugLog.log entry.
- Around line 891-892: The code exposes both capabilities.tools and text_only as
inverse booleans (set via resolveModel(id) -> resolved and supportsTools =
!resolved.textOnly); decide and implement one of two fixes: either remove the
redundant field (delete either the capabilities.tools assignment or the
text_only exposure and update callers to use the remaining field), or keep both
but add a concise comment next to the capabilities.tools and text_only
assignments explaining that they intentionally mirror different client
conventions (one used by OpenCode, one used elsewhere) so maintainers understand
why both exist; locate symbols resolveModel, resolved, supportsTools,
capabilities.tools and text_only when making the change.
In `@tests/unit/variant.test.ts`:
- Around line 50-54: The existing test "marks Claude cloud models as text-only"
only checks two Claude variants; expand it to assert that
resolveModel(...).textOnly is true for at least one representative from each
Claude family (e.g., 'claude-4.5', 'claude-4.6', 'claude-4.7',
'claude-3.7-sonnet' and one legacy/enum-style Claude identifier) so the flag is
validated across both Cognition-era and legacy catalog paths, and keep an
assertion that a non-Claude model (e.g., 'swe-1.6') remains undefined/false to
prevent regressions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: adebb187-561c-45fc-aac2-edf1588e0ef1
📒 Files selected for processing (4)
package.jsonsrc/plugin.tssrc/plugin/models.tstests/unit/variant.test.ts
|
Hmmm... it seems the performance is below Anthropic's claude. Like it is noticeably dumber. both opus 4.7 medium |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloud-direct/chat.ts`:
- Around line 726-737: The usage guard currently returns null if only
promptTokens and completionTokens are undefined; update the condition so it also
considers cachedInputTokens and cacheCreationInputTokens (i.e., return null only
when all four of promptTokens, completionTokens, cachedInputTokens, and
cacheCreationInputTokens are undefined or zero), so that cache-only metrics
still produce a usage object—adjust the check around the existing if
(promptTokens === undefined && completionTokens === undefined) return null; to
include cachedInputTokens and cacheCreationInputTokens and ensure
fullPromptTokens logic remains unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fe78aa71-bd5b-473c-936d-3fac796438d5
📒 Files selected for processing (2)
src/cloud-direct/chat.tssrc/plugin.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/plugin.ts
| if (promptTokens === undefined && completionTokens === undefined) return null; | ||
| // totalTokens reflects what OpenAI's API counts as billable: input + | ||
| // output. Cached / cache-creation / reasoning subtotals are surfaced as | ||
| // additional fields so callers that want a fuller picture (e.g. cost | ||
| // breakdown for reasoning models) can read them, but they're NOT | ||
| // double-counted into total. | ||
| const total = (promptTokens ?? 0) + (completionTokens ?? 0); | ||
| // Cognition reports cache reads/writes separately from fresh input tokens. | ||
| // OpenAI-compatible callers expect `prompt_tokens` to represent the full | ||
| // effective prompt size (including cached prompt), and opencode uses it for | ||
| // context-window display. Preserve the cache subtotals too for callers that | ||
| // want cost details. | ||
| const fullPromptTokens = (promptTokens ?? 0) + (cachedInputTokens ?? 0) + (cacheCreationInputTokens ?? 0); | ||
| const total = fullPromptTokens + (completionTokens ?? 0); | ||
| return { | ||
| kind: 'usage', | ||
| promptTokens, | ||
| promptTokens: fullPromptTokens > 0 ? fullPromptTokens : undefined, | ||
| completionTokens, |
There was a problem hiding this comment.
Broaden the usage-presence guard to include cache-derived metrics.
Line 726 still returns null unless input_tokens or output_tokens is present. That drops usage events when only cache metrics are reported, even though Line 732 now treats cache tokens as prompt-equivalent.
Suggested fix
- if (promptTokens === undefined && completionTokens === undefined) return null;
+ if (
+ promptTokens === undefined &&
+ completionTokens === undefined &&
+ cachedInputTokens === undefined &&
+ cacheCreationInputTokens === undefined &&
+ reasoningTokens === undefined
+ ) return null;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (promptTokens === undefined && completionTokens === undefined) return null; | |
| // totalTokens reflects what OpenAI's API counts as billable: input + | |
| // output. Cached / cache-creation / reasoning subtotals are surfaced as | |
| // additional fields so callers that want a fuller picture (e.g. cost | |
| // breakdown for reasoning models) can read them, but they're NOT | |
| // double-counted into total. | |
| const total = (promptTokens ?? 0) + (completionTokens ?? 0); | |
| // Cognition reports cache reads/writes separately from fresh input tokens. | |
| // OpenAI-compatible callers expect `prompt_tokens` to represent the full | |
| // effective prompt size (including cached prompt), and opencode uses it for | |
| // context-window display. Preserve the cache subtotals too for callers that | |
| // want cost details. | |
| const fullPromptTokens = (promptTokens ?? 0) + (cachedInputTokens ?? 0) + (cacheCreationInputTokens ?? 0); | |
| const total = fullPromptTokens + (completionTokens ?? 0); | |
| return { | |
| kind: 'usage', | |
| promptTokens, | |
| promptTokens: fullPromptTokens > 0 ? fullPromptTokens : undefined, | |
| completionTokens, | |
| if ( | |
| promptTokens === undefined && | |
| completionTokens === undefined && | |
| cachedInputTokens === undefined && | |
| cacheCreationInputTokens === undefined && | |
| reasoningTokens === undefined | |
| ) return null; | |
| // Cognition reports cache reads/writes separately from fresh input tokens. | |
| // OpenAI-compatible callers expect `prompt_tokens` to represent the full | |
| // effective prompt size (including cached prompt), and opencode uses it for | |
| // context-window display. Preserve the cache subtotals too for callers that | |
| // want cost details. | |
| const fullPromptTokens = (promptTokens ?? 0) + (cachedInputTokens ?? 0) + (cacheCreationInputTokens ?? 0); | |
| const total = fullPromptTokens + (completionTokens ?? 0); | |
| return { | |
| kind: 'usage', | |
| promptTokens: fullPromptTokens > 0 ? fullPromptTokens : undefined, | |
| completionTokens, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cloud-direct/chat.ts` around lines 726 - 737, The usage guard currently
returns null if only promptTokens and completionTokens are undefined; update the
condition so it also considers cachedInputTokens and cacheCreationInputTokens
(i.e., return null only when all four of promptTokens, completionTokens,
cachedInputTokens, and cacheCreationInputTokens are undefined or zero), so that
cache-only metrics still produce a usage object—adjust the check around the
existing if (promptTokens === undefined && completionTokens === undefined)
return null; to include cachedInputTokens and cacheCreationInputTokens and
ensure fullPromptTokens logic remains unchanged.
Tool calls fallback to SWE
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests