fix: custom VLM API compatibility with vLLM reasoning models#4
Merged
Merged
Conversation
Three issues fixed for OpenAI-compatible custom VLM API:
1. Empty response from reasoning models (vLLM + Qwen3.6):
- vLLM with --reasoning-parser returns content=null, actual text in
'reasoning' key. Added fallback to check reasoning_content/reasoning.
- Reasoning models waste all tokens on thinking. Added
chat_template_kwargs:{enable_thinking:false} to request payload.
2. VLM chat panel sends wrong version to backend:
- readSelectedVlmVersion() reads Gradio Dropdown display label
(model name) instead of internal value ('Custom') when component
has no native input element.
- Added heuristic: if version string matches custom model name,
treat as Custom mode.
3. VLM selection resets to local model on page refresh:
- Custom was excluded from admin default persistence, causing
init_nav_bars to override the restored selection on page load.
- Removed the exclusion so Custom persists as admin default.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Custom VLM API (OpenAI-compatible) has three bugs that prevent it from working with vLLM-hosted reasoning models like Qwen3.6:
1. Empty response from reasoning models
vLLM with
--reasoning-parser qwen3returnsmessage.content = nulland puts the actual text inmessage.reasoning. The extractor only checkedcontent, so the result was always empty — causing SuperPrompt to silently fall back to the original prompt.Additionally, reasoning models spend all
max_tokenson thinking and never producecontent. Addingchat_template_kwargs: {"enable_thinking": false}to the request makes the model respond directly incontent.2. VLM chat panel sends wrong version
readSelectedVlmVersion()reads the Gradio Dropdown's display label (the custom model name, e.g."Qwen3.6-35B-AWQ-4b") instead of its internal value ("Custom") when the component has no native<input>element. The backend receives an unrecognized version string and falls back to the default local model.Fix: If the version string matches the custom model name, treat it as Custom mode.
3. VLM selection resets on page refresh
"Custom"was explicitly excluded from admin default persistence (persist_admin and version != VLM.CUSTOM_VERSION). On page load,init_nav_barsrestores the admin default (a local model) and its.change()event cascades to override the user-facing dropdown — even thoughload_main_vlm_user_settingshad correctly restored"Custom"fromlocal_settings.json.Fix: Remove the exclusion so Custom persists as the admin default.
Changes (3 files, +14/-3)
enhanced/vlm.pyenable_thinking:falsein request;reasoning/reasoning_contentfallback in extractorjavascript/describe_vlm_chat.jsreadSelectedVlmVersion()webui.pyTested with
--reasoning-parser qwen3)