Add Qwen3-VL runtime, export, and Python guide support by amdrajeevp1 · Pull Request #1999 · microsoft/onnxruntime-genai

amdrajeevp1 · 2026-03-03T16:57:09Z

Summary

Add first-class qwen3_vl model support in the C++ runtime:
- route qwen3_vl configs to multimodal pipeline when vision+embedding are present
- fall back to decoder-only when exported as text-only
- register a dedicated Qwen3VLImageProcessor in multimodal processor factory
Extend model type handling to recognize qwen3_vl in VLM checks and Qwen-VL-specific position-id logic.
Add a new Qwen3VLTextModel export path in Python builder stack:
- detect Qwen3VLForConditionalGeneration
- set model.type = "qwen3_vl" for runtime compatibility
- preserve Q/K norm and mRoPE behavior needed by Qwen3-VL
- default exclude_embeds=true for text-component export flow
Add examples/python/qwen3-vl.md with end-to-end instructions for artifact prep, ONNX export, and sanity checks.

Additional artifacts needed for builder:

https://huggingface.co/onnx-community/Qwen3-4B-VL-ONNX/tree/main

Validation

Built package in onnxruntime-genai conda environment:
- conda run -n onnxruntime-genai python build.py
Exported Qwen3-VL ONNX package (8B INT4 text path):
- python builder.py --input ./pytorch_8b --reference ./pytorch_reference --output ./qwen3-vl-8b-instruct-onnx-vision-fp32-text-int4-cpu --precision int4
- Export completed successfully and generated model.onnx + config/tokenizer assets.
Ran text-only sanity inference:
- python qwen3vl-oga-inference.py ... -pr "Say hello in one short sentence."
- Returned expected short greeting.
Ran image+text sanity inference:
- python qwen3vl-oga-inference.py ... --image_paths ./test_images/img_50.jpg -pr "Describe this image in one sentence."
- Returned a valid one-sentence image description.

Notes

Local build/test artifacts (e.g. build, local examples/python/qwen3-vl/ data) are intentionally not part of this PR.

Add a dedicated Qwen3VLTextModel export flow with exclude_embeds enabled by default and update runtime model-type checks to qwen3_vl for decoder text inference.

Force generated GenAI config model.type to qwen3_vl so runtime loading matches C++ model type checks.

This wires Qwen3-VL into the native processor/model path, stabilizes Windows ORT DLL loading, and adds in-repo export/inference scripts so vision+embedding+text can run end-to-end with processor() on CPU.

Switch the export path to use the installed Transformers Qwen3-VL class and keep image_grid_thw as a real vision input so the runtime contract stays explicit during export runs.

Compute image_grid_thw from sanitized patch-aligned dimensions and keep prompt/image token accounting aligned with the cropped patch tensor path to avoid fixed 384x384 assumptions.

Load the model from a local patched modeling_qwen3_vl reference during export, add fixed 384x384 vision resize for runtime shape compatibility, and move qwen3vl-oga.py into the qwen3-vl-4b folder as the single inference entrypoint.

Rename the export entrypoint to builder.py, remove implicit image defaults in inference, and add a minimal README so dynamic FP32 export and sanity tests can be run from scratch consistently.

Drop fixed 384 resize from vision_processor generation and keep the export flow aligned with dynamic-image-size-only usage.

Add fp32-vision/int4-llm export instructions and capture successful text and image sanity-test outputs to make the local workflow reproducible.

Document how to fetch Qwen/Qwen3-VL-4B into the local pytorch folder and renumber the subsequent export and sanity-test steps for a clearer setup flow.

Delete generated ONNX outputs, test images, and local reference/example files that are now managed in the Hugging Face PR payload.

Add the new qwen3-vl documentation file with updated workflow headings and table-of-contents links.

src/python/py/models/builders/qwen.py

+    def __init__(self, config, io_dtype, onnx_dtype, ep, cache_dir, extra_options):
+        super().__init__(config, io_dtype, onnx_dtype, ep, cache_dir, extra_options)
+        # GenAI config "model.type" must match C++ runtime checks.
+        self.model_type = "qwen3_vl"


Avoid forcing manual DLL reload when onnxruntime.dll is already loaded so WinML-packed DLL scenarios continue to work.

Keep the onnxruntime preload import while dropping explanatory comment text to match preferred file style.

Apply formatting adjustments required by CI without changing runtime behavior.

hanbitmyths · 2026-03-04T01:01:00Z

We have a plan to support qwen3-vl cpu and cuda models at onnxruntime-extensions, onnxruntime-genai, olive, and olive-recipes, so can you hold up this PR?

amdrajeevp1 · 2026-03-04T01:06:21Z

We have a plan to support qwen3-vl cpu and cuda models at onnxruntime-extensions, onnxruntime-genai, olive, and olive-recipes, so can you hold up this PR?

@hanbitmyths - I suggest merging it as it is done and to make any incremental changes if needed. The PR supports all variants of Qwen3 VL and all path/image sizes.

hanbitmyths · 2026-03-04T20:32:25Z

A new PR supports multi-image input, which is not compatible with this PR. Please, check these PRs to support Qwen-VL family.

microsoft/onnxruntime-extensions#1027
microsoft/onnxruntime-extensions#1032
#2003
microsoft/Olive#2345
microsoft/olive-recipes#254

hanbitmyths · 2026-03-04T20:55:43Z

@amdrajeevp1, do you need any specific code path for Vitis or RyzenAI? I don't see it from this PR.

amdrajeevp1 added 13 commits February 24, 2026 19:50

Support Qwen3-VL text export path with qwen3_vl runtime naming.

c0d2411

Add a dedicated Qwen3VLTextModel export flow with exclude_embeds enabled by default and update runtime model-type checks to qwen3_vl for decoder text inference.

Set Qwen3-VL export model type to qwen3_vl.

6f6e2d6

Force generated GenAI config model.type to qwen3_vl so runtime loading matches C++ model type checks.

Add Qwen3-VL multimodal runtime integration and export pipeline.

207fdbb

This wires Qwen3-VL into the native processor/model path, stabilizes Windows ORT DLL loading, and adds in-repo export/inference scripts so vision+embedding+text can run end-to-end with processor() on CPU.

Update Qwen3-VL export loader and vision input wiring.

097942a

Switch the export path to use the installed Transformers Qwen3-VL class and keep image_grid_thw as a real vision input so the runtime contract stays explicit during export runs.

Update Qwen3-VL image processor for merge-compatible dynamic grids.

43c6bfe

Compute image_grid_thw from sanitized patch-aligned dimensions and keep prompt/image token accounting aligned with the cropped patch tensor path to avoid fixed 384x384 assumptions.

Simplify Qwen3-VL dynamic export and test workflow.

76423ad

Rename the export entrypoint to builder.py, remove implicit image defaults in inference, and add a minimal README so dynamic FP32 export and sanity tests can be run from scratch consistently.

Make builder always emit dynamic image preprocessing.

dc07c73

Drop fixed 384 resize from vision_processor generation and keep the export flow aligned with dynamic-image-size-only usage.

Document INT4 export and validation flow for Qwen3-VL.

c4d0292

Add fp32-vision/int4-llm export instructions and capture successful text and image sanity-test outputs to make the local workflow reproducible.

Add model download step for Qwen3-VL quick start.

ca58530

Document how to fetch Qwen/Qwen3-VL-4B into the local pytorch folder and renumber the subsequent export and sanity-test steps for a clearer setup flow.

Remove local Qwen3-VL example artifacts after HF handoff.

f577eea

Delete generated ONNX outputs, test images, and local reference/example files that are now managed in the Hugging Face PR payload.

Add Qwen3-VL markdown guide and align steps section.

636ac2f

Add the new qwen3-vl documentation file with updated workflow headings and table-of-contents links.

Merge branch 'main' into add-qwen3-vl-support

c64b06e

github-advanced-security bot found potential problems Mar 3, 2026

View reviewed changes

amdrajeevp1 added 5 commits March 3, 2026 10:28

Restore conditional Windows ORT DLL preload behavior.

92b466c

Avoid forcing manual DLL reload when onnxruntime.dll is already loaded so WinML-packed DLL scenarios continue to work.

Remove temporary Windows preload comments from Python package init.

f4daf55

Keep the onnxruntime preload import while dropping explanatory comment text to match preferred file style.

Reverted 2 files to match with main - changes not needed for this branch

4e6114d

Appended AMD Copyright information

553b600

Fix clang-format issues in qwen3vl image processor.

0838fb5

Apply formatting adjustments required by CI without changing runtime behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-VL runtime, export, and Python guide support#1999

Add Qwen3-VL runtime, export, and Python guide support#1999
amdrajeevp1 wants to merge 18 commits intomicrosoft:mainfrom
amdrajeevp1:add-qwen3-vl-support

amdrajeevp1 commented Mar 3, 2026 •

edited

Loading

Uh oh!

Check warning

hanbitmyths commented Mar 4, 2026

Uh oh!

amdrajeevp1 commented Mar 4, 2026

Uh oh!

hanbitmyths commented Mar 4, 2026

Uh oh!

hanbitmyths commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amdrajeevp1 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Additional artifacts needed for builder:

Validation

Notes

Uh oh!

Check warning

Uh oh!

hanbitmyths commented Mar 4, 2026

Uh oh!

amdrajeevp1 commented Mar 4, 2026

Uh oh!

hanbitmyths commented Mar 4, 2026

Uh oh!

hanbitmyths commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amdrajeevp1 commented Mar 3, 2026 •

edited

Loading