Skip to content

Claude models on windsurf seems to be text only now (no tool or error). Fix #14#15

Open
R-N wants to merge 7 commits into
rsvedant:masterfrom
R-N:master
Open

Claude models on windsurf seems to be text only now (no tool or error). Fix #14#15
R-N wants to merge 7 commits into
rsvedant:masterfrom
R-N:master

Conversation

@R-N

@R-N R-N commented May 27, 2026

Copy link
Copy Markdown

Tool calls fallback to SWE

Summary by CodeRabbit

  • New Features

    • Added a tool-call translator pipeline that automatically bridges requests to a tool-capable model when needed, including planning and streaming support.
    • /v1 models now expose tool capability and text-only status.
    • Made cloud streaming timeouts and default max input tokens configurable via environment variables.
  • Bug Fixes

    • Improved token usage accounting to include cached input tokens in prompt/total token reporting.
  • Chores

    • Version bumped to 0.3.4.
  • Tests

    • Added unit test coverage for text-only model detection.

Review Change Stack

@R-N R-N changed the title Claude models on windsurf is now text only. Fix #14 Claude models on windsurf seems to be text only now (no tool or error). Fix #14 May 27, 2026
@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds a textOnly model flag and propagates it through resolution; implements a planning+translator bridge that routes tool-bearing requests from text-only cloud models to a tool-capable translator in both streaming and non-streaming flows; exposes tool capability in /v1/models; makes cloud-direct timeouts/defaults configurable and adjusts usage accounting; adds tests and bumps version.

Changes

Text-only Model Tool Fallback Routing

Layer / File(s) Summary
Model metadata and resolution with textOnly flag
src/plugin/models.ts
Adds optional textOnly flag to model/variant metadata, marks specific Claude models as textOnly: true, extends ResolvedModel, and returns textOnly from resolveModel().
Plugin: tool-translation helpers and config
src/plugin.ts
Adds tool-translation schema, providerOptions parsing for a translator model name, translator validation (not textOnly), and helper functions for planning, result-context construction, flattening tool history, and usage combination.
Streaming path: planning + translator bridge
src/plugin.ts
Runs an opus planning pass, optionally reduces tool-result context, retries planning on socket-close with compacting, conditionally invokes a translator model UID to stream translator-emitted tool_call events, and synthesizes fallbacks when needed while combining usage.
Non-streaming path: planning + translator bridge
src/plugin.ts
Mirrors streaming behavior for non-streaming completions: planning pass, translator invocation when indicated, retry-on-socket-close behavior, forwarding translator tool_call events, and fallback handling.
Models API capability exposure
src/plugin.ts
Updates /v1/models to resolve each model ID and include capabilities.tools and text_only derived from resolveModel(id).
Cloud-direct: timeouts, defaults, and usage accounting
src/cloud-direct/chat.ts
Introduce readPositiveIntEnv, make stream idle/TTFB timeouts and DEFAULT_MAX_INPUT_TOKENS env-configurable, use the default for completion config fallback, and update usage synthesis so promptTokens includes cached and cache-creation inputs and totalTokens reflects the expanded prompt.
TextOnly flag validation test
tests/unit/variant.test.ts
Adds unit test asserting Claude cloud models return textOnly: true and a family-less model returns textOnly: undefined.
Version bump
package.json
Bumps package version from 0.3.3 to 0.3.4.

Sequence Diagram

sequenceDiagram
  participant Client
  participant WindsurfPlugin
  participant Resolver
  participant OpusPlanner
  participant TranslatorModel
  participant ToolsRuntime
  Client->>WindsurfPlugin: send chat request (with tools)
  WindsurfPlugin->>Resolver: resolveModel(requestedModel)
  Resolver-->>WindsurfPlugin: ResolvedModel (textOnly = true)
  WindsurfPlugin->>OpusPlanner: run planning pass on original model
  OpusPlanner-->>WindsurfPlugin: planning output (indicates tool intent)
  WindsurfPlugin->>TranslatorModel: streamChatEvents using translator UID (tools preserved)
  TranslatorModel-->>WindsurfPlugin: emits tool_call events
  WindsurfPlugin->>ToolsRuntime: forward/synthesize tool_call events
  ToolsRuntime-->>WindsurfPlugin: tool results
  WindsurfPlugin-->>Client: final assembled response (combined usage)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I saw a text-only model sigh,
"I cannot run the tools!"—oh my.
We plan, translate, bridge the call,
Forward events and save the fall.
Version bumped, small rabbit cheers—🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: marking Claude models as text-only and fixing tool call handling via a translator/fallback mechanism to address issue #14.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
package.json (1)

3-3: 💤 Low value

Consider a minor version bump for feature additions.

The PR adds new features (text-only model detection, automatic routing, API capability fields) rather than just bug fixes. Semantic versioning typically uses minor version increments (e.g., 0.4.0) for backward-compatible feature additions and patch increments for bug fixes. That said, 0.x versioning conventions are more flexible, so 0.3.4 is acceptable if that matches the project's versioning philosophy.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@package.json` at line 3, The package.json "version" field is still "0.3.4"
but this PR introduces new, backward-compatible features, so update the
"version" key to a minor bump (e.g., "0.4.0") to reflect feature additions; also
update any associated lockfiles or changelog entries and ensure CI/release
config that reads the version (e.g., any scripts referencing package.json
version) are consistent with the new value.
src/plugin.ts (3)

123-150: ⚡ Quick win

Consider handling unknown fallback models gracefully.

If fallbackOverride or the env var contains an invalid model name, resolveModel(fallbackName) at line 136 will throw "Unknown Windsurf model" before the code can check fallback.textOnly. The user would see a confusing error about an unknown model rather than "fallback is invalid" guidance.

Consider wrapping line 136 in a try-catch and re-throwing with context, or validating the fallback name before calling resolveModel.

🛡️ Suggested improvement
  const fallbackName =
    fallbackOverride?.trim() ||
    process.env.OPENCODE_WINDSURF_TEXT_ONLY_TOOL_FALLBACK_MODEL?.trim() ||
    DEFAULT_TEXT_ONLY_TOOL_FALLBACK_MODEL;
- const fallback = resolveModel(fallbackName);
+ let fallback;
+ try {
+   fallback = resolveModel(fallbackName);
+ } catch (err) {
+   throw new Error(
+     `Model "${resolved.modelId}" cannot use tools and the specified fallback ` +
+     `"${fallbackName}" is not a valid model. ${err instanceof Error ? err.message : String(err)}`
+   );
+ }
  if (fallback.textOnly) {
    throw new Error(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugin.ts` around lines 123 - 150, The code calls
resolveModel(fallbackName) inside routeToolsForModel and will throw an opaque
"Unknown Windsurf model" if the fallback name is invalid; wrap the
resolveModel(fallbackName) call in a try-catch (or otherwise validate
fallbackName) and rethrow a clearer error that includes context (e.g., mention
fallbackOverride/env var and that the fallback model is invalid) before checking
fallback.textOnly; reference the symbols routeToolsForModel, fallbackName,
resolveModel, and fallback when implementing the catch/re-throw so the error
message points to the failing fallback selection rather than leaking the
low-level unknown-model message.

285-285: ⚡ Quick win

Consider surfacing the fallback routing to the user.

When a text-only model is routed to a fallback, the response chunks (lines 329, 376, 405) still report model: requestedModel, hiding the fact that a different model actually processed the request. Users inspecting token usage or model-specific behavior might be confused.

The debug log at line 285 records the routing, but is only visible when WINDSURF_PLUGIN_DEBUG=1.

Consider one of:

  1. Emit a warning chunk before the first assistant chunk noting the routing
  2. Set response.model to routed.fallbackModelId when routing occurred
  3. Add a custom x-routed-from field to the response

This improves transparency without breaking the streaming flow.

Also applies to: 309-309, 311-311

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugin.ts` at line 285, The streaming responses currently report the
requested model (resolved.modelUid) even when routing to a fallback occurred;
update the response creation in streamChatEvents so that if routed.modelUid !==
resolved.modelUid you set response.model = routed.fallbackModelId (or
routed.modelUid if that is the actual processor id) instead of the requested
model, and also add a small meta field (e.g., response["x-routed-from"] =
resolved.modelUid) so callers can see the original request; adjust the three
places where assistant chunks are assembled (the code paths that currently use
resolved.modelUid/requestedModel for response.model) and keep the existing
debugLog.log entry.

891-892: 💤 Low value

Clarify whether both capabilities.tools and text_only are needed.

Lines 898-899 expose both capabilities.tools (true when tools supported) and text_only (true when tools NOT supported). These are inverses of each other.

If only one field is needed for OpenCode's consumption, consider removing the redundant one. If both serve distinct purposes (e.g., different client conventions), a comment explaining why would help future maintainers.

Also applies to: 898-899

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugin.ts` around lines 891 - 892, The code exposes both
capabilities.tools and text_only as inverse booleans (set via resolveModel(id)
-> resolved and supportsTools = !resolved.textOnly); decide and implement one of
two fixes: either remove the redundant field (delete either the
capabilities.tools assignment or the text_only exposure and update callers to
use the remaining field), or keep both but add a concise comment next to the
capabilities.tools and text_only assignments explaining that they intentionally
mirror different client conventions (one used by OpenCode, one used elsewhere)
so maintainers understand why both exist; locate symbols resolveModel, resolved,
supportsTools, capabilities.tools and text_only when making the change.
tests/unit/variant.test.ts (1)

50-54: ⚡ Quick win

Consider expanding test coverage for textOnly flag.

The test validates the flag for 2 Claude models (opus-4.7, sonnet-4.6) and 1 non-Claude model (swe-1.6), but the catalog marks ~10 Claude models as text-only across both Cognition-era and legacy sections (claude-4.5, claude-4.6, claude-3.7-sonnet, etc.).

Recommend testing at least one model from each Claude family (4.5, 4.6, 4.7, 3.7, legacy proto-enum models) to ensure the flag propagates correctly through all catalog paths. This would catch drift if future catalog regenerations accidentally drop the flag.

📝 Suggested test expansion
  test('marks Claude cloud models as text-only', () => {
+   // Cognition-era models
    expect(resolveModel('claude-opus-4.7').textOnly).toBe(true);
+   expect(resolveModel('claude-opus-4.5').textOnly).toBe(true);
    expect(resolveModel('claude-sonnet-4.6').textOnly).toBe(true);
+   expect(resolveModel('claude-sonnet-4.5').textOnly).toBe(true);
+   // Legacy proto-enum models
+   expect(resolveModel('claude-3.7-sonnet').textOnly).toBe(true);
+   expect(resolveModel('claude-4-opus').textOnly).toBe(true);
+   // Non-Claude models should not have the flag
    expect(resolveModel('swe-1.6').textOnly).toBeUndefined();
+   expect(resolveModel('gpt-5.5').textOnly).toBeUndefined();
+   expect(resolveModel('gemini-3.0-flash').textOnly).toBeUndefined();
  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/variant.test.ts` around lines 50 - 54, The existing test "marks
Claude cloud models as text-only" only checks two Claude variants; expand it to
assert that resolveModel(...).textOnly is true for at least one representative
from each Claude family (e.g., 'claude-4.5', 'claude-4.6', 'claude-4.7',
'claude-3.7-sonnet' and one legacy/enum-style Claude identifier) so the flag is
validated across both Cognition-era and legacy catalog paths, and keep an
assertion that a non-Claude model (e.g., 'swe-1.6') remains undefined/false to
prevent regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@package.json`:
- Line 3: The package.json "version" field is still "0.3.4" but this PR
introduces new, backward-compatible features, so update the "version" key to a
minor bump (e.g., "0.4.0") to reflect feature additions; also update any
associated lockfiles or changelog entries and ensure CI/release config that
reads the version (e.g., any scripts referencing package.json version) are
consistent with the new value.

In `@src/plugin.ts`:
- Around line 123-150: The code calls resolveModel(fallbackName) inside
routeToolsForModel and will throw an opaque "Unknown Windsurf model" if the
fallback name is invalid; wrap the resolveModel(fallbackName) call in a
try-catch (or otherwise validate fallbackName) and rethrow a clearer error that
includes context (e.g., mention fallbackOverride/env var and that the fallback
model is invalid) before checking fallback.textOnly; reference the symbols
routeToolsForModel, fallbackName, resolveModel, and fallback when implementing
the catch/re-throw so the error message points to the failing fallback selection
rather than leaking the low-level unknown-model message.
- Line 285: The streaming responses currently report the requested model
(resolved.modelUid) even when routing to a fallback occurred; update the
response creation in streamChatEvents so that if routed.modelUid !==
resolved.modelUid you set response.model = routed.fallbackModelId (or
routed.modelUid if that is the actual processor id) instead of the requested
model, and also add a small meta field (e.g., response["x-routed-from"] =
resolved.modelUid) so callers can see the original request; adjust the three
places where assistant chunks are assembled (the code paths that currently use
resolved.modelUid/requestedModel for response.model) and keep the existing
debugLog.log entry.
- Around line 891-892: The code exposes both capabilities.tools and text_only as
inverse booleans (set via resolveModel(id) -> resolved and supportsTools =
!resolved.textOnly); decide and implement one of two fixes: either remove the
redundant field (delete either the capabilities.tools assignment or the
text_only exposure and update callers to use the remaining field), or keep both
but add a concise comment next to the capabilities.tools and text_only
assignments explaining that they intentionally mirror different client
conventions (one used by OpenCode, one used elsewhere) so maintainers understand
why both exist; locate symbols resolveModel, resolved, supportsTools,
capabilities.tools and text_only when making the change.

In `@tests/unit/variant.test.ts`:
- Around line 50-54: The existing test "marks Claude cloud models as text-only"
only checks two Claude variants; expand it to assert that
resolveModel(...).textOnly is true for at least one representative from each
Claude family (e.g., 'claude-4.5', 'claude-4.6', 'claude-4.7',
'claude-3.7-sonnet' and one legacy/enum-style Claude identifier) so the flag is
validated across both Cognition-era and legacy catalog paths, and keep an
assertion that a non-Claude model (e.g., 'swe-1.6') remains undefined/false to
prevent regressions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: adebb187-561c-45fc-aac2-edf1588e0ef1

📥 Commits

Reviewing files that changed from the base of the PR and between c331611 and ff4b08f.

📒 Files selected for processing (4)
  • package.json
  • src/plugin.ts
  • src/plugin/models.ts
  • tests/unit/variant.test.ts

@R-N R-N marked this pull request as draft May 27, 2026 08:42
@R-N

R-N commented May 27, 2026

Copy link
Copy Markdown
Author

Hmmm... it seems the performance is below Anthropic's claude. Like it is noticeably dumber. both opus 4.7 medium

@R-N R-N marked this pull request as ready for review May 28, 2026 12:10

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloud-direct/chat.ts`:
- Around line 726-737: The usage guard currently returns null if only
promptTokens and completionTokens are undefined; update the condition so it also
considers cachedInputTokens and cacheCreationInputTokens (i.e., return null only
when all four of promptTokens, completionTokens, cachedInputTokens, and
cacheCreationInputTokens are undefined or zero), so that cache-only metrics
still produce a usage object—adjust the check around the existing if
(promptTokens === undefined && completionTokens === undefined) return null; to
include cachedInputTokens and cacheCreationInputTokens and ensure
fullPromptTokens logic remains unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fe78aa71-bd5b-473c-936d-3fac796438d5

📥 Commits

Reviewing files that changed from the base of the PR and between ff4b08f and 9755fda.

📒 Files selected for processing (2)
  • src/cloud-direct/chat.ts
  • src/plugin.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/plugin.ts

Comment thread src/cloud-direct/chat.ts
Comment on lines 726 to 737
if (promptTokens === undefined && completionTokens === undefined) return null;
// totalTokens reflects what OpenAI's API counts as billable: input +
// output. Cached / cache-creation / reasoning subtotals are surfaced as
// additional fields so callers that want a fuller picture (e.g. cost
// breakdown for reasoning models) can read them, but they're NOT
// double-counted into total.
const total = (promptTokens ?? 0) + (completionTokens ?? 0);
// Cognition reports cache reads/writes separately from fresh input tokens.
// OpenAI-compatible callers expect `prompt_tokens` to represent the full
// effective prompt size (including cached prompt), and opencode uses it for
// context-window display. Preserve the cache subtotals too for callers that
// want cost details.
const fullPromptTokens = (promptTokens ?? 0) + (cachedInputTokens ?? 0) + (cacheCreationInputTokens ?? 0);
const total = fullPromptTokens + (completionTokens ?? 0);
return {
kind: 'usage',
promptTokens,
promptTokens: fullPromptTokens > 0 ? fullPromptTokens : undefined,
completionTokens,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Broaden the usage-presence guard to include cache-derived metrics.

Line 726 still returns null unless input_tokens or output_tokens is present. That drops usage events when only cache metrics are reported, even though Line 732 now treats cache tokens as prompt-equivalent.

Suggested fix
-  if (promptTokens === undefined && completionTokens === undefined) return null;
+  if (
+    promptTokens === undefined &&
+    completionTokens === undefined &&
+    cachedInputTokens === undefined &&
+    cacheCreationInputTokens === undefined &&
+    reasoningTokens === undefined
+  ) return null;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (promptTokens === undefined && completionTokens === undefined) return null;
// totalTokens reflects what OpenAI's API counts as billable: input +
// output. Cached / cache-creation / reasoning subtotals are surfaced as
// additional fields so callers that want a fuller picture (e.g. cost
// breakdown for reasoning models) can read them, but they're NOT
// double-counted into total.
const total = (promptTokens ?? 0) + (completionTokens ?? 0);
// Cognition reports cache reads/writes separately from fresh input tokens.
// OpenAI-compatible callers expect `prompt_tokens` to represent the full
// effective prompt size (including cached prompt), and opencode uses it for
// context-window display. Preserve the cache subtotals too for callers that
// want cost details.
const fullPromptTokens = (promptTokens ?? 0) + (cachedInputTokens ?? 0) + (cacheCreationInputTokens ?? 0);
const total = fullPromptTokens + (completionTokens ?? 0);
return {
kind: 'usage',
promptTokens,
promptTokens: fullPromptTokens > 0 ? fullPromptTokens : undefined,
completionTokens,
if (
promptTokens === undefined &&
completionTokens === undefined &&
cachedInputTokens === undefined &&
cacheCreationInputTokens === undefined &&
reasoningTokens === undefined
) return null;
// Cognition reports cache reads/writes separately from fresh input tokens.
// OpenAI-compatible callers expect `prompt_tokens` to represent the full
// effective prompt size (including cached prompt), and opencode uses it for
// context-window display. Preserve the cache subtotals too for callers that
// want cost details.
const fullPromptTokens = (promptTokens ?? 0) + (cachedInputTokens ?? 0) + (cacheCreationInputTokens ?? 0);
const total = fullPromptTokens + (completionTokens ?? 0);
return {
kind: 'usage',
promptTokens: fullPromptTokens > 0 ? fullPromptTokens : undefined,
completionTokens,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloud-direct/chat.ts` around lines 726 - 737, The usage guard currently
returns null if only promptTokens and completionTokens are undefined; update the
condition so it also considers cachedInputTokens and cacheCreationInputTokens
(i.e., return null only when all four of promptTokens, completionTokens,
cachedInputTokens, and cacheCreationInputTokens are undefined or zero), so that
cache-only metrics still produce a usage object—adjust the check around the
existing if (promptTokens === undefined && completionTokens === undefined)
return null; to include cachedInputTokens and cacheCreationInputTokens and
ensure fullPromptTokens logic remains unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant