中文版 | English
XSlim is a Post-Training Quantization (PTQ) tool developed by SpacemiT. It integrates chip-optimized quantization strategies and provides a unified interface for ONNX model quantization via JSON configuration files.
- INT8 / FP16 / Dynamic Quantization – multiple precision levels for different deployment scenarios
- JSON-driven configuration – simple, declarative quantization setup
- Python API & CLI – use as a library or from the command line
- Custom preprocessing – plug in your own preprocessing functions
- Automatic YOLO decode fusion – fuse supported YOLO decode subgraphs into a single
spacemit_functions.YoloDecodenode - ONNX Function-aware export – preserve embedded FunctionProto definitions and emit required custom-domain imports automatically
- ONNX-based workflow – built on the ONNX ecosystem
pip install xslimOr install from source:
git clone https://github.com/spacemit-com/xslim.git
cd xslim
pip install .For local development, use an editable install:
pip install -e .Build metadata is defined in pyproject.toml; setup.py is kept only as a legacy compatibility shim.
import xslim
# Using a JSON config file
xslim.quantize_onnx_model("config.json")
# Using a dict
config = {
"model_parameters": {
"onnx_model": "model.onnx",
"working_dir": "./output"
},
"calibration_parameters": {
"input_parameters": [{
"mean_value": [123.675, 116.28, 103.53],
"std_value": [58.395, 57.12, 57.375],
"color_format": "rgb",
"preprocess_file": "PT_IMAGENET",
"data_list_path": "./calib_img_list.txt"
}]
}
}
xslim.quantize_onnx_model(config)
# You can also pass the model path and output path directly
xslim.quantize_onnx_model("config.json", "input.onnx", "output.onnx")# Installed CLI entry point
xslim --config config.json
# Module entry point also remains available
python -m xslim --config config.json
# Specify input and output model paths
xslim -c config.json -i input.onnx -o output.onnx
# Dynamic quantization (no config file needed)
xslim -i input.onnx -o output.onnx --dynq
# FP16 conversion (no config file needed)
xslim -i input.onnx -o output.onnx --fp16
# Convert the default ai.onnx opset to a target version
xslim -i input.onnx -o output.onnx --opset 20
# ONNX simplification only (no config file needed)
xslim -i input.onnx -o output.onnxFor supported YOLO exports, no extra switch is required: XSlim will try to fuse decode-heavy post-processing into spacemit_functions.YoloDecode during simplification and keep the corresponding ONNX FunctionProto in the exported model.
- Configuration Reference – Full description of all JSON configuration options
- Examples – Step-by-step guides for INT8, FP16, dynamic quantization, custom preprocessing, and more
- Accuracy Tuning Guide – How to diagnose and improve quantization accuracy
See the samples directory for ready-to-run examples covering ResNet-18, MobileNet V3, BERT, and more. YOLO-specific usage notes are documented in the examples and accuracy-tuning guides.
For a full list of published versions, see the Releases page. The summary below is synchronized with that release history; 2.1.0 is the current in-tree development version and has not been published yet.
| Version | Highlights |
|---|---|
| 2.1.0 | Current in-tree development version; add automatic spacemit_functions.YoloDecode fusion for supported YOLO exports, preserve custom ONNX FunctionProto definitions during quantization/export, and improve opset-24/custom-domain handling coverage |
| 2.0.14 | Latest published release; add configurable default ai.onnx opset conversion for quantization and conversion workflows |
| 2.0.13 | Upgrade the default ONNX opset to 24, standardize operator domains, and align version metadata with the 2.0.12 release |
| 2.0.12 | Complete README changelog/release metadata, add accuracy-tuning docs and README links, introduce the xslim-accuracy-tuning GitHub skill, add YOLO truncation guidance, and rename input parameters for consistency |
| 2.0.11 | Fix Pad/missing-input handling, add Or/Einsum/Selu support, normalize Conv/ConvTranspose kernel shapes, and raise minimum Python to 3.9 |
| 2.0.10 | Align release metadata, improve CI/test coverage, normalize missing default ONNX opset before dynamic quantization, and refine shape inference handling |
| 2.0.9 | Add documentation, preserve tensor dtype metadata during FP16 conversion, and restore compatibility with onnxslim 0.1.87 |
| 2.0.8 | Improve packaging/CI, add torch executor operator coverage, add PyPI publish workflow, and centralize version metadata |
| 2.0.7 | Fix FP16 conversion bug on complex models |
| 2.0.6 | Fix metadata props deletion; default CLI behavior changed to model simplification (use --dynq for dynamic quantization) |
Contributions are welcome! Please open an issue or submit a pull request.
This project is licensed under the Apache License 2.0.