fix: custom VLM API compatibility with vLLM reasoning models by lonrencn · Pull Request #4 · Windecay/SimpAI_Studio

lonrencn · 2026-06-28T08:52:37Z

Problem

Custom VLM API (OpenAI-compatible) has three bugs that prevent it from working with vLLM-hosted reasoning models like Qwen3.6:

1. Empty response from reasoning models

vLLM with --reasoning-parser qwen3 returns message.content = null and puts the actual text in message.reasoning. The extractor only checked content, so the result was always empty — causing SuperPrompt to silently fall back to the original prompt.

Additionally, reasoning models spend all max_tokens on thinking and never produce content. Adding chat_template_kwargs: {"enable_thinking": false} to the request makes the model respond directly in content.

2. VLM chat panel sends wrong version

readSelectedVlmVersion() reads the Gradio Dropdown's display label (the custom model name, e.g. "Qwen3.6-35B-AWQ-4b") instead of its internal value ("Custom") when the component has no native <input> element. The backend receives an unrecognized version string and falls back to the default local model.

Fix: If the version string matches the custom model name, treat it as Custom mode.

3. VLM selection resets on page refresh

"Custom" was explicitly excluded from admin default persistence (persist_admin and version != VLM.CUSTOM_VERSION). On page load, init_nav_bars restores the admin default (a local model) and its .change() event cascades to override the user-facing dropdown — even though load_main_vlm_user_settings had correctly restored "Custom" from local_settings.json.

Fix: Remove the exclusion so Custom persists as the admin default.

Changes (3 files, +14/-3)

File	Change
`enhanced/vlm.py`	`enable_thinking:false` in request; `reasoning`/`reasoning_content` fallback in extractor
`javascript/describe_vlm_chat.js`	Model-name heuristic in `readSelectedVlmVersion()`
`webui.py`	Remove Custom exclusion from admin persistence

Tested with

vLLM 0.x + Qwen3.6-35B-AWQ-4b (--reasoning-parser qwen3)
SuperPrompt, VLM chat, and page refresh persistence all working

Three issues fixed for OpenAI-compatible custom VLM API: 1. Empty response from reasoning models (vLLM + Qwen3.6): - vLLM with --reasoning-parser returns content=null, actual text in 'reasoning' key. Added fallback to check reasoning_content/reasoning. - Reasoning models waste all tokens on thinking. Added chat_template_kwargs:{enable_thinking:false} to request payload. 2. VLM chat panel sends wrong version to backend: - readSelectedVlmVersion() reads Gradio Dropdown display label (model name) instead of internal value ('Custom') when component has no native input element. - Added heuristic: if version string matches custom model name, treat as Custom mode. 3. VLM selection resets to local model on page refresh: - Custom was excluded from admin default persistence, causing init_nav_bars to override the restored selection on page load. - Removed the exclusion so Custom persists as admin default.

Windecay merged commit a5b192b into Windecay:main Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: custom VLM API compatibility with vLLM reasoning models#4

fix: custom VLM API compatibility with vLLM reasoning models#4
Windecay merged 1 commit into
Windecay:mainfrom
lonrencn:fix/vlm-custom-api-compat

lonrencn commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lonrencn commented Jun 28, 2026

Problem

1. Empty response from reasoning models

2. VLM chat panel sends wrong version

3. VLM selection resets on page refresh

Changes (3 files, +14/-3)

Tested with

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants