Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/toolchain/appendix/app_flow_manual.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Kneron End to End Simulator v0.32.0
# Kneron End to End Simulator v0.32.1

This project allows users to perform image inference using Kneron's built in simulator. We encourage users to use simply use the kneron_inference function to perform the tests on your inputs.

Expand Down
6 changes: 6 additions & 0 deletions docs/toolchain/appendix/history.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@

## Toolchain Change log

* **[v0.32.1]**
* Add `dma_bandwidth` and `weight_bandwidth` to IP evaluator arguments.
* Replace `hardware_cut_opt` with `compiler_tiling` to keep consistent with other toolchain apis. The `hardware_cut_opt` is now deprecated and will be removed in future versions. Please use `compiler_tiling` instead.
* Update evaluator to raise warning when meeting unsupported operator instead of error.
* Update ktc to clean up more intermediate files generated during the flow.
* Fix the evaluator bug using wrong 730 frequency.
* **[v0.32.0]**
* Add Einsum defusion in kneronnxopt.
* Support Cast to int64 in knerex and compiler.
Expand Down
56 changes: 46 additions & 10 deletions docs/toolchain/appendix/kneronnxopt.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Kneronnxopt

Kneronnxopt is the ONNX optimizer project for kneron hardware platforms. Its purpose is to provide shapes for all the tensors as well as accelerate the inference and compiling process. Currently, we support ONNX up to opset 18.
Kneronnxopt is the ONNX optimizer project for Kneron hardware platforms. It prepares tensor shapes and optimizes graph structures to improve inference and compilation flow. Currently, it supports ONNX opset 8 to 18.

## 1. Preparation

Expand All @@ -12,24 +12,60 @@ conda activate onnx1.13

## 2. Usage

The tool is under `/workspace/libs/kneronnxopt`. You can use the following command to run the tool:
### 2.1. Standard model optimization

Use module execution for standard ONNX models:

```bash
python /workspace/libs/kneronnxopt/kneronnxopt/optimize.py -o <output_onnx_model> <input_onnx_model>
python -m kneronnxopt.optimize <input_onnx_model> -o <output_onnx_model>
```

It also has the following optional arguments:
Optional arguments:

* `-h, --help`: Show this help message and exit.
* `--log`: Set log level (default: INFO). Available log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.
* `--duplicate-shared-weight`: Duplicate shared weights in the model. Default is False.
* `--skip-check`: Skip the onnxruntime check or not. Enabling this flag can speed up the script, but also introcduce risks for future model deployment.
* `--duplicate-shared-weights`: By what level to duplicate shared weights. `0`: no duplication, `1`: duplicate only when required by compiler, `2`: always duplicate. Default is `1`.
* `--skip-check`: Skip the onnxruntime check. Enabling this flag can speed up the script, but also introduces risks for future model deployment.
* `--overwrite-input-shapes`: Overwrite the input shape. The format is "input_name:dim0,dim1,...,dimN" or simply "dim0,dim1,...,dimN" when there is only one input, for example, "data:1,3,224,224" or "1,3,224,224". Note: you might want to use some visualization tools like netron to make sure what the input name and dimension ordering (NCHW or NHWC) is.
* `--skip-fuse-qkv`: Skip the `fuse_qkv` optimization.
* `--clear-descriptions`: Clear all descriptions in the graph.
* `--clear-shapes`: Clear all existing shapes in the graph except input shapes.
* `--opt-matmul`: Optimize MatMul operators for Kneron compiler.
* `--replace-avgpool-with-conv`: Replace AveragePool with depthwise Conv when possible to avoid CPU nodes.
* `--replace-dilated-conv`: Replace dilated Conv patterns when possible.
* `--defuse-gaps`: Defuse GAP patterns when possible.

## 3. Notes
Notes:

* If `-o` is not provided, output defaults to `<input>_optimized.onnx`.

### 2.2. Large model optimization (>2 GiB)

For large ONNX models, use the large-model module entry:

```bash
python -m kneronnxopt.large_model_fast_proc <input_onnx_model> -o <output_onnx_model>
```

Optional arguments:

This tool is still under development. If you have any questions, please feel free to contact us.
* `-h, --help`: Show this help message and exit.
* `--log`: Set log level (default: INFO). Available log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.
* `--overwrite-input-shapes`: Overwrite input shapes for simplify and shape inference.
* `--skip-fuse-qkv`: Skip the `fuse_qkv` optimization.
* `--onnxtool`: Use `onnx-tool` for shape inference. This is useful when shapes cannot be inferred by the default pass. However, this tool may clip off some nodes, so use with caution and always check the output model.

### 2.3. Help command

To inspect full and current options from the tool directly:

```bash
python -m kneronnxopt.optimize -h
python -m kneronnxopt.large_model_fast_proc -h
```

## 3. Notes

This tool automatically update the model opset to 18. This process has no good way to reverse. Please use other tools is you do not want to upgrade your model opset.
This appendix focuses on console usage. For Python API usage, please refer to [3.1.2 ONNX Optimization](../manual_3_onnx.md#312-onnx-optimization).

If you want to cut the model, please use `onnx.utils.extract_model` from ONNX. Please check <https://onnx.ai/onnx/api/utils.html>
If you want to cut the model, please use `onnx.utils.extract_model` from ONNX. Please check <https://onnx.ai/onnx/api/utils.html>
8 changes: 7 additions & 1 deletion docs/toolchain/manual_1_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# 1. Toolchain Overview

**2026-03**
**Toolchain v0.32.0**
**Toolchain v0.32.1**

## 1.1. Introduction

Expand All @@ -19,6 +19,12 @@ In this document, you'll learn:
3. How to utilize the tools through Python API.

**Major changes of the current version**
* **[v0.32.1]**
* Add `dma_bandwidth` and `weight_bandwidth` to IP evaluator arguments.
* Replace `hardware_cut_opt` with `compiler_tiling` to keep consistent with other toolchain apis. The `hardware_cut_opt` is now deprecated and will be removed in future versions. Please use `compiler_tiling` instead.
* Update evaluator to raise warning when meeting unsupported operator instead of error.
* Update ktc to clean up more intermediate files generated during the flow.
* Fix the evaluator bug using wrong 730 frequency.
* **[v0.32.0]**
* Add Einsum defusion in kneronnxopt.
* Support Cast to int64 in knerex and compiler.
Expand Down
19 changes: 15 additions & 4 deletions docs/toolchain/manual_3_onnx.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,15 @@ kneronnxopt.optimize(
duplicate_shared_weights=1,
skip_check=False,
overwrite_input_shapes=None,
convert_f16=True,
skipped_optimizers=None,
skip_fuse_qkv=False,
clear_descriptions=False,
opt_matmul=False,
clear_shapes=False,
replace_avgpool_with_conv=False,
replace_dilated_conv=False,
defuse_gaps=False,
):
```

Expand All @@ -42,10 +47,15 @@ Args:
* duplicate_shared_weights (int, optional): by what level, duplicate shared weight. 0-no duplication, 1-duplicate shared weights only when kneron compiler not support, 2-duplicate shared weights always. Default is 1.
* skip_check (bool): skip the final check or not.
* overwrite_input_shapes (List\[str\]): overwrite the input shape. The format is "input_name:dim0,dim1,...,dimN" or simply "dim0,dim1,...,dimN" when there is only one input, for example, "data:1,3,224,224" or "1,3,224,224". Note: you might want to use some visualization tools like netron to make sure what the input name and dimension ordering (NCHW or NHWC) is.
* skipped_optimizers (list): skip the onnx optimizers. Check onnx document for details. Default is None.
* convert_f16 (bool): convert f16 initializers and constants to f32 or not. Default is True.
* skipped_optimizers (list): skip selected optimizers. Check onnx-simplifier documents for details. Default is None.
* skip_fuse_qkv (bool): skip the fuse_qkv optimization or not. By default, fuse_qkv is enabled.
* clear_descriptions (bool): clear all descriptions in the graph. By default, descriptions are not cleared.
* opt_matmul (bool): optimize matmul operators for specific kneron compiler. By default, this option is not set.
* clear_shapes (bool): clear all existing shapes in the graph except for input shapes. By default, shapes are not cleared.
* replace_avgpool_with_conv (bool): replace AveragePool with depthwise Conv when possible to avoid CPU nodes. By default, this option is not set.
* replace_dilated_conv (bool): replace dilated Conv patterns when possible. By default, this option is not set.
* defuse_gaps (bool): defuse GAP patterns when possible. By default, this option is not set.

Suppose we have a onnx object, here is the example python code:

Expand All @@ -54,7 +64,7 @@ import kneronnxopt
optimized_m = kneronnxopt.optimize(input_m, skip_fuse_qkv=True)
```

In this line of python code, `kneronnxopt.optimize` is the function that takes an onnx object and optimize it. The return value `result_m` is the converted onnx object.
In this line of python code, `kneronnxopt.optimize` is the function that takes an onnx object and optimize it. The return value `optimized_m` is the optimized onnx object.

The previous `onnx2onnx_flow` API is also available in the `onnx1.13` environment. It is a wrapper of the `kneronnxopt.optimize` API. But not all the previous options are available in the `onnx1.13` environment. We recommend you to use the `kneronnxopt.optimize` API instead of the `onnx2onnx_flow` API.

Expand All @@ -78,7 +88,7 @@ By the way, to save the model, you can use the following function from the onnx
onnx.save(optimized_m, '/data1/optimized.onnx')
```

We also provide a command line tool for both model optimization and evaluation. Please check FAQ 3.4.4 for details.
For kneronnxopt console usage, please check [Kneronnxopt](appendix/kneronnxopt.md). We also provide a command line tool for both model optimization and evaluation. Please check FAQ 3.4.4 for details.

### 3.1.3. ONNX Editing

Expand Down Expand Up @@ -300,7 +310,7 @@ You can use `-o` or `--optimizer-only` to only run the optimization step without
You can use `-h` or `--help` to see all the options.

```
usage: python -m ktc.opt_and_eval [-h] [-e] [-E EVALUATOR_REPORT_PATH] [-o] [-O OPTIMIZED_PATH] [--deep-search] {520,720,530,630,730} path
usage: python -m ktc.opt_and_eval [-h] [-P] [-e] [-E EVALUATOR_REPORT_PATH] [-o] [-O OPTIMIZED_PATH] [--deep-search] {520,720,530,630,730} path

Optimize ONNX model and run IP Evaluator

Expand All @@ -318,4 +328,5 @@ optional arguments:
-O OPTIMIZED_PATH, --optimized-path OPTIMIZED_PATH
Path to save the optimized ONNX model.
--deep-search Use deep search for optimization, which may take longer but can yield better performance.
-P, --print Print the evaluation result in the terminal.
```
38 changes: 34 additions & 4 deletions docs/toolchain/manual_5_nef.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,16 @@ Batch compile turns multiple models into a single binary file. We have two APIs

```python
#[API]
ktc.compile(model_list, output_dir="/data1/kneron_flow", dedicated_output_buffer=True, weight_compress=False)
ktc.compile(
model_list,
output_dir="/data1/kneron_flow",
dedicated_output_buffer=True,
weight_compress=False,
flatbuffer=True,
compiler_tiling="default",
weight_bandwidth=None,
dma_bandwidth=None,
)
```

Compile the models and generate the nef file. The nef path will be returned.
Expand All @@ -19,12 +28,28 @@ Args:
* output_dir (str, optional): output directory. Defaults to "/data1/kneron_flow".
* dedicated_output_buffer (bool, optional): dedicated output buffer. Defaults to True.
* weight_compress (bool, optional): compress weight to slightly reduce the binary file size. Defaults to False.
* hardware_cut_opt (bool, optional): optimize the hardware memory usage while processing large inputs. This option might cause the compiling time increase. Currently, only available for 720. Defaults to False.
* hardware_cut_opt (bool, optional): DEPRECATED. Use `compiler_tiling="deep_search"` instead. If True and `compiler_tiling` is `"default"`, `compiler_tiling` will be treated as `"deep_search"`. Defaults to False.
* flatbuffer (bool, optional): enable new flatbuffer mode for 720. Defaults to True.
* compiler_tiling (str, optional): choose from `"default"`, `"deep_search"`, or `"partial_graph_search"`. Ignored when a model provides its own compiler config json. KDP520 always uses `"default"`. Defaults to `"default"`.
* weight_bandwidth: weight bandwidth in gbps. Defaults to None to use the platform default for the IP evaluator.
* dma_bandwidth: dma bandwidth in gbps. Defaults to None to use the platform default for the IP evaluator.

```python
#[API]
ktc.encrypt_compile(model_list, output_dir="/data1/kneron_flow", dedicated_output_buffer=True, mode=None, key="", key_file="", encryption_efuse_key="", weight_compress=False)
ktc.encrypt_compile(
model_list,
output_dir="/data1/kneron_flow",
dedicated_output_buffer=True,
mode=None,
key="",
key_file="",
encryption_efuse_key="",
weight_compress=False,
flatbuffer=True,
compiler_tiling="default",
weight_bandwidth=None,
dma_bandwidth=None,
)
```

Compile the models, generate an encrypted nef file. The nef path will be returned.
Expand All @@ -39,8 +64,13 @@ Args:
* key_file (str, optional): key file path. Required in mode 1. Defaults to "".
* encryption_efuse_key (str, optional): a hex code. Required in mode 2 and optional in mode 1. Defaults to "".
* weight_compress (bool, optional): compress weight to slightly reduce the binary file size. Defaults to False.
* hardware_cut_opt (bool, optional): optimize the hardware memory usage while processing large inputs. This option might cause the compiling time increase. Currently, only available for 720. Defaults to False.
* hardware_cut_opt (bool, optional): DEPRECATED. Use `compiler_tiling="deep_search"` instead. If True and `compiler_tiling` is `"default"`, `compiler_tiling` will be treated as `"deep_search"`. Defaults to False.
* flatbuffer (bool, optional): enable new flatbuffer mode for 720. Defaults to True.
* compiler_tiling (str, optional): choose from `"default"`, `"deep_search"`, or `"partial_graph_search"`. Ignored when a model provides its own compiler config json. KDP520 always uses `"default"`. Defaults to `"default"`.
* weight_bandwidth: weight bandwidth in gbps. Defaults to None to use the platform default for the IP evaluator.
* dma_bandwidth: dma bandwidth in gbps. Defaults to None to use the platform default for the IP evaluator.

If you previously used `hardware_cut_opt=True`, use `compiler_tiling="deep_search"` instead.

We would start with single model first.

Expand Down
Loading