Skip to content

spacemit-com/xslim

Repository files navigation

XSlim

中文版 | English

Version License Python

XSlim is a Post-Training Quantization (PTQ) tool developed by SpacemiT. It integrates chip-optimized quantization strategies and provides a unified interface for ONNX model quantization via JSON configuration files.


Features

  • INT8 / FP16 / Dynamic Quantization – multiple precision levels for different deployment scenarios
  • JSON-driven configuration – simple, declarative quantization setup
  • Python API & CLI – use as a library or from the command line
  • Custom preprocessing – plug in your own preprocessing functions
  • Automatic YOLO decode fusion – fuse supported YOLO decode subgraphs into a single spacemit_functions.YoloDecode node
  • ONNX Function-aware export – preserve embedded FunctionProto definitions and emit required custom-domain imports automatically
  • ONNX-based workflow – built on the ONNX ecosystem

Installation

pip install xslim

Or install from source:

git clone https://github.com/spacemit-com/xslim.git
cd xslim
pip install .

For local development, use an editable install:

pip install -e .

Build metadata is defined in pyproject.toml; setup.py is kept only as a legacy compatibility shim.

Quick Start

Python API

import xslim

# Using a JSON config file
xslim.quantize_onnx_model("config.json")

# Using a dict
config = {
    "model_parameters": {
        "onnx_model": "model.onnx",
        "working_dir": "./output"
    },
    "calibration_parameters": {
        "input_parameters": [{
            "mean_value": [123.675, 116.28, 103.53],
            "std_value": [58.395, 57.12, 57.375],
            "color_format": "rgb",
            "preprocess_file": "PT_IMAGENET",
            "data_list_path": "./calib_img_list.txt"
        }]
    }
}
xslim.quantize_onnx_model(config)

# You can also pass the model path and output path directly
xslim.quantize_onnx_model("config.json", "input.onnx", "output.onnx")

Command Line

# Installed CLI entry point
xslim --config config.json

# Module entry point also remains available
python -m xslim --config config.json

# Specify input and output model paths
xslim -c config.json -i input.onnx -o output.onnx

# Dynamic quantization (no config file needed)
xslim -i input.onnx -o output.onnx --dynq

# FP16 conversion (no config file needed)
xslim -i input.onnx -o output.onnx --fp16

# Convert the default ai.onnx opset to a target version
xslim -i input.onnx -o output.onnx --opset 20

# ONNX simplification only (no config file needed)
xslim -i input.onnx -o output.onnx

For supported YOLO exports, no extra switch is required: XSlim will try to fuse decode-heavy post-processing into spacemit_functions.YoloDecode during simplification and keep the corresponding ONNX FunctionProto in the exported model.

Documentation

Samples

See the samples directory for ready-to-run examples covering ResNet-18, MobileNet V3, BERT, and more. YOLO-specific usage notes are documented in the examples and accuracy-tuning guides.

Changelog

For a full list of published versions, see the Releases page. The summary below is synchronized with that release history; 2.1.0 is the current in-tree development version and has not been published yet.

Version Highlights
2.1.0 Current in-tree development version; add automatic spacemit_functions.YoloDecode fusion for supported YOLO exports, preserve custom ONNX FunctionProto definitions during quantization/export, and improve opset-24/custom-domain handling coverage
2.0.14 Latest published release; add configurable default ai.onnx opset conversion for quantization and conversion workflows
2.0.13 Upgrade the default ONNX opset to 24, standardize operator domains, and align version metadata with the 2.0.12 release
2.0.12 Complete README changelog/release metadata, add accuracy-tuning docs and README links, introduce the xslim-accuracy-tuning GitHub skill, add YOLO truncation guidance, and rename input parameters for consistency
2.0.11 Fix Pad/missing-input handling, add Or/Einsum/Selu support, normalize Conv/ConvTranspose kernel shapes, and raise minimum Python to 3.9
2.0.10 Align release metadata, improve CI/test coverage, normalize missing default ONNX opset before dynamic quantization, and refine shape inference handling
2.0.9 Add documentation, preserve tensor dtype metadata during FP16 conversion, and restore compatibility with onnxslim 0.1.87
2.0.8 Improve packaging/CI, add torch executor operator coverage, add PyPI publish workflow, and centralize version metadata
2.0.7 Fix FP16 conversion bug on complex models
2.0.6 Fix metadata props deletion; default CLI behavior changed to model simplification (use --dynq for dynamic quantization)

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This project is licensed under the Apache License 2.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages