Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ RUN npm install npm@9.8.1 -g && \
RUN python3 -m pip install --no-cache-dir --upgrade pip
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/doc-builder.git

RUN git clone --depth 1 --branch v3.5 https://github.com/Xilinx/Vitis-AI.git && cd Vitis-AI/src/vai_quantizer/vai_q_onnx && sh build.sh && pip install pkgs/*.whl
RUN git clone --depth 1 --branch feature/onnx https://gitenterprise.xilinx.com/AMDNeuralOpt/Quark.git && cd Vitis-AI/src/vai_quantizer/Quark && python setup.py sdist bdist_wheel -d pkgs && pip install pkgs/*.whl

RUN git clone $clone_url && cd optimum-amd && git checkout $commit_sha
RUN python3 -m pip install --no-cache-dir ./optimum-amd[brevitas,tests]
RUN pip install onnxruntime==1.14.0
RUN pip install onnxruntime==1.17.0
4 changes: 2 additions & 2 deletions docs/source/ryzenai/package_reference/quantization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
Licensed under the MIT License.
-->

# Quantization for Ryzen AI IPU
# Quantization for Ryzen AI NPU

Please refer to the guide [How to apply quantization](https://huggingface.co/docs/optimum/amd/ryzenai/usage_guides/quantization) to understand how to use the following classes to quantize models targeting Ryzen AI IPU.
Please refer to the guide [How to apply quantization](https://huggingface.co/docs/optimum/amd/ryzenai/usage_guides/quantization) to understand how to use the following classes to quantize models targeting Ryzen AI NPU.

## Using Vitis AI Quantizer

Expand Down
8 changes: 4 additions & 4 deletions docs/source/ryzenai/usage_guides/quantization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Licensed under the MIT License.

# Quantization for Ryzen AI

Ryzen AI IPU best performances are achieved using [quantized models](https://huggingface.co/docs/optimum/concept_guides/quantization). There are two different ways to quantize models for Ryzen AI IPU:
Ryzen AI NPU best performances are achieved using [quantized models](https://huggingface.co/docs/optimum/concept_guides/quantization). There are two different ways to quantize models for Ryzen AI NPU:
* through [Vitis AI Quantizer](https://ryzenai.docs.amd.com/en/latest/vai_quant/vai_q_onnx.html), used in Optimum's [`~ryzenai.RyzenAIOnnxQuantizer`], which is designed for ONNX model quantization. Currently supports quantising [timm](https://github.com/huggingface/pytorch-image-models) models using dynamic and static quantization methods.
* through [Brevitas](https://github.com/Xilinx/brevitas) library, used in Optimum's [`~brevitas.BrevitasQuantizer`]. Brevitas allows to quantize directly PyTorch models, which may be optionally exported to ONNX. This is recommended to quantize other models.

Expand All @@ -14,7 +14,7 @@ Ryzen AI IPU best performances are achieved using [quantized models](https://hug

RyzenAI Quantizer provides an easy-to-use Post Training Quantization (PTQ) flow on the pre-trained model saved in the ONNX format. It generates a quantized ONNX model ready to be deployed with the Ryzen AI.

The Quantizer supports various configuration and functions to quantize models targeting for deployment on IPU_CNN, IPU_Transformer and CPU.
The Quantizer supports various configuration and functions to quantize models targeting for deployment on NPU_CNN, NPU_Transformer and CPU.

The [`~ryzenai.RyzenAIOnnxQuantizer`] can be initialized using the `from_pretrained` method, either from a local model folder or a model hosted on Hugging Face Hub:

Expand All @@ -28,7 +28,7 @@ Below you will find an easy end-to-end example on how to quantize a VGG model fr

* To begin, export the VGG model to ONNX using [Optimum Exporters](https://huggingface.co/docs/optimum/main/en/exporters/onnx/overview). Ensure static shapes are specified for inference.
* Create a preprocessing function to handle specific image format conversions and apply necessary transformations to prepare the input for the model.
* Initialize the RyzenAI quantizer (RyzenAIOnnxQuantizer) and configure the quantization settings using AutoQuantizationConfig. The recommended quantization configuration for CNN models to be deployed on the IPU is loaded using `ipu_cnn_config`.
* Initialize the RyzenAI quantizer (RyzenAIOnnxQuantizer) and configure the quantization settings using AutoQuantizationConfig. The recommended quantization configuration for CNN models to be deployed on the NPU is loaded using `npu_cnn_config`.
* Obtain a calibration dataset using the quantizer's `get_calibration_dataset` method. This dataset is crucial for computing quantization parameters during the quantization process.
* Run the quantizer with the specified quantization configuration and calibration data. The quantization parameters computed during this process are embedded as constants in the quantized model.
* The resulting quantized model is saved in the specified quantization directory.
Expand Down Expand Up @@ -76,7 +76,7 @@ Below you will find an easy end-to-end example on how to quantize a VGG model fr
>>> quantizer = RyzenAIOnnxQuantizer.from_pretrained(export_dir)

>>> # Step 4: Load recommended quantization config for model
>>> quantization_config = AutoQuantizationConfig.ipu_cnn_config()
>>> quantization_config = AutoQuantizationConfig.npu_cnn_config()

>>> # Step 5: Obtain a calibration dataset for computing quantization parameters
>>> train_calibration_dataset = quantizer.get_calibration_dataset(
Expand Down
4 changes: 2 additions & 2 deletions examples/quantization/ryzenai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ The quantization process is abstracted via the AutoQuantizationConfig and the Ry
You can read the [Vitis AI Quantizer for ONNX](https://ryzenai.docs.amd.com/en/latest/vai_quant/vai_q_onnx.html) to learn about VAI_Q_ONNX quantization.

### Creating an AutoQuantizationConfig
The AutoQuantizationConfig class is used to specify how quantization should be done. The class can be initialized using the ipu_cnn_config() method.
The AutoQuantizationConfig class is used to specify how quantization should be done. The class can be initialized using the npu_cnn_config() method.
```python
from optimum.amd.ryzenai import AutoQuantizationConfig
quantization_config = AutoQuantizationConfig.ipu_cnn_config()
quantization_config = AutoQuantizationConfig.npu_cnn_config()

```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def preprocess_fn(ex, transforms):

# quantize
quantizer = RyzenAIOnnxQuantizer.from_pretrained(onnx_model)
quantization_config = AutoQuantizationConfig.ipu_cnn_config()
quantization_config = AutoQuantizationConfig.npu_cnn_config()

calibration_dataset = quantizer.get_calibration_dataset(
args.dataset,
Expand Down
102 changes: 26 additions & 76 deletions optimum/amd/ryzenai/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,100 +2,50 @@
# Licensed under the MIT License.
"""Configuration classes for quantization with RyzenAI."""

from dataclasses import asdict, dataclass
from dataclasses import asdict
from enum import Enum
from typing import Optional

import vai_q_onnx
from onnxruntime.quantization import CalibrationMethod, QuantFormat, QuantType
from onnxruntime.quantization import CalibrationMethod, QuantType
from quark.onnx.calibrate import PowerOfTwoMethod
from quark.onnx.quantization.config.config import QuantizationConfig

from optimum.configuration_utils import BaseConfig


@dataclass
class QuantizationConfig:
"""
QuantizationConfig is the configuration class handling all the RyzenAI quantization parameters.

Args:
is_static (`bool`):
Whether to apply static quantization or dynamic quantization.
format (`QuantFormat`):
Targeted RyzenAI quantization representation format.
For the Operator Oriented (QOperator) format, all the quantized operators have their own ONNX definitions.
For the Tensor Oriented (QDQ) format, the model is quantized by inserting QuantizeLinear / DeQuantizeLinear
operators.
calibration_method (`CalibrationMethod`):
The method chosen to calculate the activations quantization parameters using the calibration dataset.
activations_dtype (`QuantType`, defaults to `QuantType.QUInt8`):
The quantization data types to use for the activations.
activations_symmetric (`bool`, defaults to `False`):
Whether to apply symmetric quantization on the activations.
weights_dtype (`QuantType`, defaults to `QuantType.QInt8`):
The quantization data types to use for the weights.
weights_symmetric (`bool`, defaults to `True`):
Whether to apply symmetric quantization on the weights.
enable_dpu (`bool`, defaults to `True`):
Determines whether to generate a quantized model that is suitable for the DPU. If set to True, the quantization
process will create a model that is optimized for DPU computations.

"""

format: QuantFormat = QuantFormat.QDQ
calibration_method: CalibrationMethod = vai_q_onnx.PowerOfTwoMethod.MinMSE
activations_dtype: QuantType = QuantType.QUInt8
activations_symmetric: bool = True
weights_dtype: QuantType = QuantType.QInt8
weights_symmetric: bool = True
enable_dpu: bool = True

class AutoQuantizationConfig:
@staticmethod
def quantization_type_str(activations_dtype: QuantType, weights_dtype: QuantType) -> str:
return (
f"{'s8' if activations_dtype == QuantType.QInt8 else 'u8'}"
f"/"
f"{'s8' if weights_dtype == QuantType.QInt8 else 'u8'}"
)

@property
def use_symmetric_calibration(self) -> bool:
return self.activations_symmetric and self.weights_symmetric

def __str__(self):
return (
f"{self.format} ("
f"schema: {QuantizationConfig.quantization_type_str(self.activations_dtype, self.weights_dtype)}, "
f"enable_dpu: {self.enable_dpu})"
def npu_cnn_config():
return QuantizationConfig(
calibrate_method=PowerOfTwoMethod.MinMSE,
activation_type=QuantType.QUInt8,
weight_type=QuantType.QInt8,
enable_npu_cnn=True,
extra_options={"ActivationSymmetric": True},
)


class AutoQuantizationConfig:
@staticmethod
def ipu_cnn_config():
def npu_transformer_config():
return QuantizationConfig(
format=QuantFormat.QDQ,
calibration_method=vai_q_onnx.PowerOfTwoMethod.MinMSE,
activations_dtype=QuantType.QUInt8,
activations_symmetric=True,
weights_dtype=QuantType.QInt8,
weights_symmetric=True,
enable_dpu=True,
calibrate_method=CalibrationMethod.MinMax,
activation_type=QuantType.QInt8,
weight_type=QuantType.QInt8,
enable_npu_transformer=True,
)

@staticmethod
def cpu_cnn_config(
use_symmetric_activations: bool = False,
use_symmetric_weights: bool = True,
enable_dpu: bool = False,
include_cle: bool = True,
include_fast_ft: bool = True,
extra_options: dict = None,
):
return QuantizationConfig(
format=QuantFormat.QDQ,
calibration_method=vai_q_onnx.CalibrationMethod.MinMax,
activations_dtype=QuantType.QUInt8,
activations_symmetric=use_symmetric_activations,
weights_dtype=QuantType.QInt8,
weights_symmetric=use_symmetric_weights,
enable_dpu=enable_dpu,
calibrate_method=CalibrationMethod.Percentile,
activation_type=QuantType.QInt8,
weight_type=QuantType.QInt8,
include_cle=include_cle,
include_fast_ft=include_fast_ft,
extra_options=extra_options,
)


Expand Down
2 changes: 1 addition & 1 deletion optimum/amd/ryzenai/pipelines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ def pipeline(
The model that will be used by the pipeline to make predictions. This can be a model identifier or an
actual instance of a pretrained model. If not provided, the default model for the specified task will be loaded.
vaip_config (`Optional[str]`, defaults to `None`):
Runtime configuration file for inference with Ryzen IPU. A default config file can be found in the Ryzen AI VOE package,
Runtime configuration file for inference with Ryzen NPU. A default config file can be found in the Ryzen AI VOE package,
extracted during installation under the name `vaip_config.json`.
model_type (`Optional[str]`, defaults to `None`):
Model type for the model
Expand Down
24 changes: 7 additions & 17 deletions optimum/amd/ryzenai/quantization.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@
import onnx
from datasets import Dataset, load_dataset
from onnxruntime.quantization import CalibrationDataReader
from vai_q_onnx import quantize_static
from quark.onnx import ModelQuantizer
from quark.onnx.quantization.config.config import Config, QuantizationConfig

from optimum.quantization_base import OptimumQuantizer
from transformers import PretrainedConfig

from .configuration import QuantizationConfig, RyzenAIConfig
from .configuration import RyzenAIConfig
from .modeling import RyzenAIModel


Expand Down Expand Up @@ -161,22 +162,11 @@ def quantize(

suffix = f"_{file_suffix}" if file_suffix else ""
quantized_model_path = save_dir.joinpath(f"{self.onnx_model_path.stem}{suffix}").with_suffix(".onnx")

LOGGER.info("Quantizing model...")
quantize_static(
model_input=Path(self.onnx_model_path).as_posix(),
model_output=quantized_model_path.as_posix(),
calibration_data_reader=reader,
quant_format=quantization_config.format,
calibrate_method=quantization_config.calibration_method,
weight_type=quantization_config.weights_dtype,
activation_type=quantization_config.activations_dtype,
enable_dpu=quantization_config.enable_dpu,
extra_options={
"WeightSymmetric": quantization_config.weights_symmetric,
"ActivationSymmetric": quantization_config.activations_symmetric,
},
)

quant_config = Config(global_quant_config=quantization_config)
quantizer = ModelQuantizer(quant_config)
quantizer.quantize_model(Path(self.onnx_model_path).as_posix(), quantized_model_path.as_posix(), reader)

LOGGER.info(f"Saved quantized model at: {save_dir}")

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@
"Programming Language :: Python :: 3.11",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
],
keywords="transformers, amd, ryzen, ipu, quantization, on-device, instinct",
keywords="transformers, amd, ryzen, npu, quantization, on-device, instinct",
url="https://github.com/huggingface/optimum-amd",
author="HuggingFace Inc. Special Ops Team",
author_email="hardware@huggingface.co",
Expand Down
Binary file added tests/ryzenai/1x4.xclbin
Binary file not shown.
Binary file added tests/ryzenai/4x4.xclbin
Binary file not shown.
30 changes: 15 additions & 15 deletions tests/ryzenai/test_modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,14 +97,14 @@ def test_model(self, model_id):

file_name, ort_input, input_name = load_model_and_input(model_id)

outputs_ipu, outputs_cpu = self.prepare_outputs(
outputs_npu, outputs_cpu = self.prepare_outputs(
model_id, RyzenAIModelForImageClassification, ort_input, cache_dir, cache_key, file_name
)

self.assertIn("logits", outputs_ipu)
self.assertIn("logits", outputs_npu)
self.assertIn("logits", outputs_cpu)

self.assertTrue(np.allclose(outputs_ipu.logits, outputs_cpu.logits, atol=1e-4))
self.assertTrue(np.allclose(outputs_npu.logits, outputs_cpu.logits, atol=1e-4))

current_ops = self.get_ops(cache_dir, cache_key)
baseline_ops = self.get_baseline_ops(cache_key)
Expand Down Expand Up @@ -147,12 +147,12 @@ def test_model(self, model_id):

file_name, ort_input, input_name = load_model_and_input(model_id)

outputs_ipu, outputs_cpu = self.prepare_outputs(
outputs_npu, outputs_cpu = self.prepare_outputs(
model_id, RyzenAIModelForObjectDetection, ort_input, cache_dir, cache_key, file_name
)

for output_ipu, output_cpu in zip(outputs_ipu.values(), outputs_cpu.values()):
self.assertTrue(np.allclose(output_ipu, output_cpu, atol=1e-4))
for output_npu, output_cpu in zip(outputs_npu.values(), outputs_cpu.values()):
self.assertTrue(np.allclose(output_npu, output_cpu, atol=1e-4))

current_ops = self.get_ops(cache_dir, cache_key)
baseline_ops = self.get_baseline_ops(cache_key)
Expand Down Expand Up @@ -212,12 +212,12 @@ def test_model(self, model_id):

file_name, ort_input, input_name = load_model_and_input(model_id)

outputs_ipu, outputs_cpu = self.prepare_outputs(
outputs_npu, outputs_cpu = self.prepare_outputs(
model_id, RyzenAIModelForSemanticSegmentation, ort_input, cache_dir, cache_key, file_name
)

for output_ipu, output_cpu in zip(outputs_ipu.values(), outputs_cpu.values()):
self.assertTrue(np.allclose(output_ipu, output_cpu, atol=1e-4))
for output_npu, output_cpu in zip(outputs_npu.values(), outputs_cpu.values()):
self.assertTrue(np.allclose(output_npu, output_cpu, atol=1e-4))

current_ops = self.get_ops(cache_dir, cache_key)
baseline_ops = self.get_baseline_ops(cache_key)
Expand All @@ -237,12 +237,12 @@ def test_model(self, model_id):

file_name, ort_input, input_name = load_model_and_input(model_id)

outputs_ipu, outputs_cpu = self.prepare_outputs(
outputs_npu, outputs_cpu = self.prepare_outputs(
model_id, RyzenAIModelForImageToImage, ort_input, cache_dir, cache_key, file_name
)

for output_ipu, output_cpu in zip(outputs_ipu.values(), outputs_cpu.values()):
self.assertTrue(np.allclose(output_ipu, output_cpu, atol=1e-4))
for output_npu, output_cpu in zip(outputs_npu.values(), outputs_cpu.values()):
self.assertTrue(np.allclose(output_npu, output_cpu, atol=1e-4))

current_ops = self.get_ops(cache_dir, cache_key)
baseline_ops = self.get_baseline_ops(cache_key)
Expand All @@ -263,12 +263,12 @@ def test_model(self, model_id):
file_name, ort_input, input_name = load_model_and_input(model_id)
ort_input = {input_name: ort_input}

outputs_ipu, outputs_cpu = self.prepare_outputs(
outputs_npu, outputs_cpu = self.prepare_outputs(
model_id, RyzenAIModelForCustomTasks, ort_input, cache_dir, cache_key, file_name
)

for output_ipu, output_cpu in zip(outputs_ipu.values(), outputs_cpu.values()):
self.assertTrue(np.allclose(output_ipu, output_cpu, atol=1e-4))
for output_npu, output_cpu in zip(outputs_npu.values(), outputs_cpu.values()):
self.assertTrue(np.allclose(output_npu, output_cpu, atol=1e-4))

current_ops = self.get_ops(cache_dir, cache_key)
baseline_ops = self.get_baseline_ops(cache_key)
Expand Down
6 changes: 3 additions & 3 deletions tests/ryzenai/test_quantization.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def preprocess_fn(ex, transforms):

# quantize model
quantizer = RyzenAIOnnxQuantizer.from_pretrained(export_dir.name)
quantization_config = AutoQuantizationConfig.ipu_cnn_config()
quantization_config = AutoQuantizationConfig.npu_cnn_config()

train_calibration_dataset = quantizer.get_calibration_dataset(
"imagenet-1k",
Expand All @@ -116,11 +116,11 @@ def preprocess_fn(ex, transforms):
evaluation_set = load_dataset(dataset_name, split="validation", streaming=True, trust_remote_code=True)
ort_inputs = preprocess_fn(next(iter(evaluation_set)), transforms)["pixel_values"].unsqueeze(0)

outputs_ipu, outputs_cpu = self.prepare_outputs(
outputs_npu, outputs_cpu = self.prepare_outputs(
quantization_dir.name, RyzenAIModelForImageClassification, ort_inputs, cache_dir, cache_key
)

self.assertTrue(torch.allclose(outputs_ipu.logits, outputs_cpu.logits, atol=1e-4))
self.assertTrue(torch.allclose(outputs_npu.logits, outputs_cpu.logits, atol=1e-4))

current_ops = self.get_ops(cache_dir, cache_key)
baseline_ops = self.get_baseline_ops(cache_key)
Expand Down
Loading