Skip to content

fix: custom VLM API compatibility with vLLM reasoning models#4

Merged
Windecay merged 1 commit into
Windecay:mainfrom
lonrencn:fix/vlm-custom-api-compat
Jun 30, 2026
Merged

fix: custom VLM API compatibility with vLLM reasoning models#4
Windecay merged 1 commit into
Windecay:mainfrom
lonrencn:fix/vlm-custom-api-compat

Conversation

@lonrencn

Copy link
Copy Markdown
Contributor

Problem

Custom VLM API (OpenAI-compatible) has three bugs that prevent it from working with vLLM-hosted reasoning models like Qwen3.6:

1. Empty response from reasoning models

vLLM with --reasoning-parser qwen3 returns message.content = null and puts the actual text in message.reasoning. The extractor only checked content, so the result was always empty — causing SuperPrompt to silently fall back to the original prompt.

Additionally, reasoning models spend all max_tokens on thinking and never produce content. Adding chat_template_kwargs: {"enable_thinking": false} to the request makes the model respond directly in content.

2. VLM chat panel sends wrong version

readSelectedVlmVersion() reads the Gradio Dropdown's display label (the custom model name, e.g. "Qwen3.6-35B-AWQ-4b") instead of its internal value ("Custom") when the component has no native <input> element. The backend receives an unrecognized version string and falls back to the default local model.

Fix: If the version string matches the custom model name, treat it as Custom mode.

3. VLM selection resets on page refresh

"Custom" was explicitly excluded from admin default persistence (persist_admin and version != VLM.CUSTOM_VERSION). On page load, init_nav_bars restores the admin default (a local model) and its .change() event cascades to override the user-facing dropdown — even though load_main_vlm_user_settings had correctly restored "Custom" from local_settings.json.

Fix: Remove the exclusion so Custom persists as the admin default.

Changes (3 files, +14/-3)

File Change
enhanced/vlm.py enable_thinking:false in request; reasoning/reasoning_content fallback in extractor
javascript/describe_vlm_chat.js Model-name heuristic in readSelectedVlmVersion()
webui.py Remove Custom exclusion from admin persistence

Tested with

  • vLLM 0.x + Qwen3.6-35B-AWQ-4b (--reasoning-parser qwen3)
  • SuperPrompt, VLM chat, and page refresh persistence all working

Three issues fixed for OpenAI-compatible custom VLM API:

1. Empty response from reasoning models (vLLM + Qwen3.6):
   - vLLM with --reasoning-parser returns content=null, actual text in
     'reasoning' key. Added fallback to check reasoning_content/reasoning.
   - Reasoning models waste all tokens on thinking. Added
     chat_template_kwargs:{enable_thinking:false} to request payload.

2. VLM chat panel sends wrong version to backend:
   - readSelectedVlmVersion() reads Gradio Dropdown display label
     (model name) instead of internal value ('Custom') when component
     has no native input element.
   - Added heuristic: if version string matches custom model name,
     treat as Custom mode.

3. VLM selection resets to local model on page refresh:
   - Custom was excluded from admin default persistence, causing
     init_nav_bars to override the restored selection on page load.
   - Removed the exclusion so Custom persists as admin default.
@Windecay Windecay merged commit a5b192b into Windecay:main Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants