Skip to content

Add Qwen3-VL runtime, export, and Python guide support#1999

Open
amdrajeevp1 wants to merge 18 commits intomicrosoft:mainfrom
amdrajeevp1:add-qwen3-vl-support
Open

Add Qwen3-VL runtime, export, and Python guide support#1999
amdrajeevp1 wants to merge 18 commits intomicrosoft:mainfrom
amdrajeevp1:add-qwen3-vl-support

Conversation

@amdrajeevp1
Copy link
Contributor

@amdrajeevp1 amdrajeevp1 commented Mar 3, 2026

Summary

  • Add first-class qwen3_vl model support in the C++ runtime:
    • route qwen3_vl configs to multimodal pipeline when vision+embedding are present
    • fall back to decoder-only when exported as text-only
    • register a dedicated Qwen3VLImageProcessor in multimodal processor factory
  • Extend model type handling to recognize qwen3_vl in VLM checks and Qwen-VL-specific position-id logic.
  • Add a new Qwen3VLTextModel export path in Python builder stack:
    • detect Qwen3VLForConditionalGeneration
    • set model.type = "qwen3_vl" for runtime compatibility
    • preserve Q/K norm and mRoPE behavior needed by Qwen3-VL
    • default exclude_embeds=true for text-component export flow
  • Add examples/python/qwen3-vl.md with end-to-end instructions for artifact prep, ONNX export, and sanity checks.

Additional artifacts needed for builder:

https://huggingface.co/onnx-community/Qwen3-4B-VL-ONNX/tree/main

Validation

  • Built package in onnxruntime-genai conda environment:
    • conda run -n onnxruntime-genai python build.py
  • Exported Qwen3-VL ONNX package (8B INT4 text path):
    • python builder.py --input ./pytorch_8b --reference ./pytorch_reference --output ./qwen3-vl-8b-instruct-onnx-vision-fp32-text-int4-cpu --precision int4
    • Export completed successfully and generated model.onnx + config/tokenizer assets.
  • Ran text-only sanity inference:
    • python qwen3vl-oga-inference.py ... -pr "Say hello in one short sentence."
    • Returned expected short greeting.
  • Ran image+text sanity inference:
    • python qwen3vl-oga-inference.py ... --image_paths ./test_images/img_50.jpg -pr "Describe this image in one sentence."
    • Returned a valid one-sentence image description.

Notes

  • Local build/test artifacts (e.g. build, local examples/python/qwen3-vl/ data) are intentionally not part of this PR.

Add a dedicated Qwen3VLTextModel export flow with exclude_embeds enabled by default and update runtime model-type checks to qwen3_vl for decoder text inference.
Force generated GenAI config model.type to qwen3_vl so runtime loading matches C++ model type checks.
This wires Qwen3-VL into the native processor/model path, stabilizes Windows ORT DLL loading, and adds in-repo export/inference scripts so vision+embedding+text can run end-to-end with processor() on CPU.
Switch the export path to use the installed Transformers Qwen3-VL class and keep image_grid_thw as a real vision input so the runtime contract stays explicit during export runs.
Compute image_grid_thw from sanitized patch-aligned dimensions and keep prompt/image token accounting aligned with the cropped patch tensor path to avoid fixed 384x384 assumptions.
Load the model from a local patched modeling_qwen3_vl reference during export, add fixed 384x384 vision resize for runtime shape compatibility, and move qwen3vl-oga.py into the qwen3-vl-4b folder as the single inference entrypoint.
Rename the export entrypoint to builder.py, remove implicit image defaults in inference, and add a minimal README so dynamic FP32 export and sanity tests can be run from scratch consistently.
Drop fixed 384 resize from vision_processor generation and keep the export flow aligned with dynamic-image-size-only usage.
Add fp32-vision/int4-llm export instructions and capture successful text and image sanity-test outputs to make the local workflow reproducible.
Document how to fetch Qwen/Qwen3-VL-4B into the local pytorch folder and renumber the subsequent export and sanity-test steps for a clearer setup flow.
Delete generated ONNX outputs, test images, and local reference/example files that are now managed in the Hugging Face PR payload.
Add the new qwen3-vl documentation file with updated workflow headings and table-of-contents links.
def __init__(self, config, io_dtype, onnx_dtype, ep, cache_dir, extra_options):
super().__init__(config, io_dtype, onnx_dtype, ep, cache_dir, extra_options)
# GenAI config "model.type" must match C++ runtime checks.
self.model_type = "qwen3_vl"

Check warning

Code scanning / CodeQL

Overwriting attribute in super-class or sub-class Warning

Assignment overwrites attribute model_type, which was previously defined in superclass
Model
.
Avoid forcing manual DLL reload when onnxruntime.dll is already loaded so WinML-packed DLL scenarios continue to work.
Keep the onnxruntime preload import while dropping explanatory comment text to match preferred file style.
Apply formatting adjustments required by CI without changing runtime behavior.
@hanbitmyths
Copy link

We have a plan to support qwen3-vl cpu and cuda models at onnxruntime-extensions, onnxruntime-genai, olive, and olive-recipes, so can you hold up this PR?

@amdrajeevp1
Copy link
Contributor Author

We have a plan to support qwen3-vl cpu and cuda models at onnxruntime-extensions, onnxruntime-genai, olive, and olive-recipes, so can you hold up this PR?

@hanbitmyths - I suggest merging it as it is done and to make any incremental changes if needed. The PR supports all variants of Qwen3 VL and all path/image sizes.

@hanbitmyths
Copy link

A new PR supports multi-image input, which is not compatible with this PR. Please, check these PRs to support Qwen-VL family.

microsoft/onnxruntime-extensions#1027
microsoft/onnxruntime-extensions#1032
#2003
microsoft/Olive#2345
microsoft/olive-recipes#254

@hanbitmyths
Copy link

@amdrajeevp1, do you need any specific code path for Vitis or RyzenAI? I don't see it from this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants