Add Qwen3-VL runtime, export, and Python guide support#1999
Add Qwen3-VL runtime, export, and Python guide support#1999amdrajeevp1 wants to merge 18 commits intomicrosoft:mainfrom
Conversation
Add a dedicated Qwen3VLTextModel export flow with exclude_embeds enabled by default and update runtime model-type checks to qwen3_vl for decoder text inference.
Force generated GenAI config model.type to qwen3_vl so runtime loading matches C++ model type checks.
This wires Qwen3-VL into the native processor/model path, stabilizes Windows ORT DLL loading, and adds in-repo export/inference scripts so vision+embedding+text can run end-to-end with processor() on CPU.
Switch the export path to use the installed Transformers Qwen3-VL class and keep image_grid_thw as a real vision input so the runtime contract stays explicit during export runs.
Compute image_grid_thw from sanitized patch-aligned dimensions and keep prompt/image token accounting aligned with the cropped patch tensor path to avoid fixed 384x384 assumptions.
Load the model from a local patched modeling_qwen3_vl reference during export, add fixed 384x384 vision resize for runtime shape compatibility, and move qwen3vl-oga.py into the qwen3-vl-4b folder as the single inference entrypoint.
Rename the export entrypoint to builder.py, remove implicit image defaults in inference, and add a minimal README so dynamic FP32 export and sanity tests can be run from scratch consistently.
Drop fixed 384 resize from vision_processor generation and keep the export flow aligned with dynamic-image-size-only usage.
Add fp32-vision/int4-llm export instructions and capture successful text and image sanity-test outputs to make the local workflow reproducible.
Document how to fetch Qwen/Qwen3-VL-4B into the local pytorch folder and renumber the subsequent export and sanity-test steps for a clearer setup flow.
Delete generated ONNX outputs, test images, and local reference/example files that are now managed in the Hugging Face PR payload.
Add the new qwen3-vl documentation file with updated workflow headings and table-of-contents links.
| def __init__(self, config, io_dtype, onnx_dtype, ep, cache_dir, extra_options): | ||
| super().__init__(config, io_dtype, onnx_dtype, ep, cache_dir, extra_options) | ||
| # GenAI config "model.type" must match C++ runtime checks. | ||
| self.model_type = "qwen3_vl" |
Check warning
Code scanning / CodeQL
Overwriting attribute in super-class or sub-class Warning
Avoid forcing manual DLL reload when onnxruntime.dll is already loaded so WinML-packed DLL scenarios continue to work.
Keep the onnxruntime preload import while dropping explanatory comment text to match preferred file style.
Apply formatting adjustments required by CI without changing runtime behavior.
|
We have a plan to support qwen3-vl cpu and cuda models at onnxruntime-extensions, onnxruntime-genai, olive, and olive-recipes, so can you hold up this PR? |
@hanbitmyths - I suggest merging it as it is done and to make any incremental changes if needed. The PR supports all variants of Qwen3 VL and all path/image sizes. |
|
A new PR supports multi-image input, which is not compatible with this PR. Please, check these PRs to support Qwen-VL family. microsoft/onnxruntime-extensions#1027 |
|
@amdrajeevp1, do you need any specific code path for Vitis or RyzenAI? I don't see it from this PR. |
Summary
qwen3_vlmodel support in the C++ runtime:qwen3_vlconfigs to multimodal pipeline when vision+embedding are presentQwen3VLImageProcessorin multimodal processor factoryqwen3_vlin VLM checks and Qwen-VL-specific position-id logic.Qwen3VLTextModelexport path in Python builder stack:Qwen3VLForConditionalGenerationmodel.type = "qwen3_vl"for runtime compatibilityexclude_embeds=truefor text-component export flowexamples/python/qwen3-vl.mdwith end-to-end instructions for artifact prep, ONNX export, and sanity checks.Additional artifacts needed for builder:
https://huggingface.co/onnx-community/Qwen3-4B-VL-ONNX/tree/main
Validation
onnxruntime-genaiconda environment:conda run -n onnxruntime-genai python build.pypython builder.py --input ./pytorch_8b --reference ./pytorch_reference --output ./qwen3-vl-8b-instruct-onnx-vision-fp32-text-int4-cpu --precision int4model.onnx+ config/tokenizer assets.python qwen3vl-oga-inference.py ... -pr "Say hello in one short sentence."python qwen3vl-oga-inference.py ... --image_paths ./test_images/img_50.jpg -pr "Describe this image in one sentence."Notes
build, localexamples/python/qwen3-vl/data) are intentionally not part of this PR.