Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/toolchain/appendix/app_flow_manual.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Kneron End to End Simulator v0.31.1
# Kneron End to End Simulator v0.32.0

This project allows users to perform image inference using Kneron's built in simulator. We encourage users to use simply use the kneron_inference function to perform the tests on your inputs.

Expand Down
80 changes: 42 additions & 38 deletions docs/toolchain/appendix/fx_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,28 +28,29 @@ The summary will show the IP evaluator information. Below are some examples of r
<p><span style="font-weight: bold;">Figure 4.</span> Summary for platform 730, mode 2 (with fixed-point model generated and snr check.) </p>
</div>

| **name** | **explaination** | **availability** |
|-------------------------|--------------------------------------------------------------------------------|----------------------------------|
| **docker_version** | the version of the toolchain docker for this report | |
| **comments** | extra information | |
| **input bitwidth** | customer set input bitwidth: int8 or int16 | |
| **output bitwidth** | customer set output bitwidth: int8 or int16 | |
| **datapath bitwidth** | customer set data bitwidth (or activation bitwidth): int8 or int16 | |
| **weight bitwidth** | customer set weight bitwidth: int8 or int16 or int4. int4 only for certain HW. | |
| **fps** | estimated frame per second. | |
| **ITC** | estimated inference time. | |
| **RDMA bandwidth** | set effective peak RDMA bandwidth based on HW | |
| **WDMA bandwidth** | set effective peak WDMA bandwidth based on HW | |
| **GETW bandwidth** | set effective peak weight loading bandwidth based on HW | |
| **RV** | Total data load (except weight load) from DDR in one inference | |
| **WV** | Total data write to DDR in one inference | |
| **cpu node** | CPU node in model will be listed here | if any cpu node exists |
| **SNR(dB)** | The snr of fix point model inferenced results. | mode 2 and 3 |
| **btm_dynasty_path** | path to inferenced results | mode 2 and 3 |
| **btm** | check the bit-true-match between dynasty and csim inference | mode 2 and 3 |
| **bie** | generated bie file (fix point model) for dynasty inference | mode 1/2/3 |
| **nef** | generated nef file (fix point model) for csim / dongle inference | mode 1/2/3 |
| **gen fx model report** | file name of this report | |
| **name** | **explaination** | **availability** |
| ----------------------- | ------------------------------------------------------------------------------ | ---------------------- |
| **docker_version** | the version of the toolchain docker for this report | |
| **comments** | extra information | |
| **input bitwidth** | customer set input bitwidth: int8 or int16 | |
| **output bitwidth** | customer set output bitwidth: int8 or int16 | |
| **datapath bitwidth** | customer set data bitwidth (or activation bitwidth): int8 or int16 | |
| **weight bitwidth** | customer set weight bitwidth: int8 or int16 or int4. int4 only for certain HW. | |
| **fps** | estimated frame per second. | |
| **ITC** | estimated inference time. | |
| **RDMA bandwidth** | set effective peak RDMA bandwidth based on HW | |
| **WDMA bandwidth** | set effective peak WDMA bandwidth based on HW | |
| **GETW bandwidth** | set effective peak weight loading bandwidth based on HW | |
| **RV** | Total data load (except weight load) from DDR in one inference | |
| **WV** | Total data write to DDR in one inference | |
| **cpu node** | CPU node in model will be listed here | if any cpu node exists |
| **SNR(dB)** | The snr of fix point model inferenced results. | mode 2 and 3 |
| **btm_dynasty_path** | path to inferenced results | mode 2 and 3 |
| **btm** | check the bit-true-match between dynasty and csim inference | mode 2 and 3 |
| **bie** | generated bie file (fix point model) for dynasty inference | mode 1/2/3 |
| **nef** | generated nef file (fix point model) for csim / dongle inference | mode 1/2/3 |
| **backend node graph** | the graph after node fusion and decomposition, with backend node information. | |
| **gen fx model report** | file name of this report | |



Expand All @@ -75,20 +76,23 @@ The summary will show the IP evaluator information. Below are some examples of r
<p><span style="font-weight: bold;">Figure 8.</span> Node details for platform 730, mode 2 (with fixed-point model generated and SNR check). </p>
</div>

| **column** | **explanation** | **availability** |
|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| **node** | model operation node name after node fusion and decomposition | |
| **SNR** | SNR score between fixed-point model and original model (per layer) | every layer for mode 3 and only output layer for mode 2 |
| **node origin** | corresponding operation node name in original onnx before node fusion and decomposition | |
| **type** | NPU / FUSED / CPU | |
| **node backend** | corresponding backend node name | |
| **CMD_node_idx** | index of command node | below info not available for 520 |
| **bw in / bw out / bw weight** | input / output / weight bitwidth for this node | mode 1 / 2 / 3 |
| **MAC_cycle** | MAC engine runtime cycle number for this backend node. | |
| **MAC_runtime(ms)** | MAC engine runtime for this backend node. | |
| **RDMA_amount(Byte)** | RDMA amount for this backend node. | |
| **WDMA_amount(Byte)** | WDMA amount for this backend node. | |
| **Weight_amount(Byte)** | weight amount for this backend node. | |
| **runtime(ms)** | operator runtime. | |
| **in_fmt / out_fmt** | input/output data formats. If only one input/output or multiple inputs/outputs with same format, the only format will be shown. If multiple formats for this node, then the details will be listed as “FORMAT1:IN1,IN2 \ FORMAT2:IN3”. | |
| **column** | **explanation** | **availability** |
| ------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| **node** | model operation node name after node fusion and decomposition | |
| **SNR** | SNR score between fixed-point model and original model (per layer) | every layer for mode 3 and only output layer for mode 2 |
| **node origin** | corresponding operation node name in original onnx before node fusion and decomposition | |
| **type** | NPU / FUSED / CPU | |
| **node backend** | corresponding backend node name | |
| **CMD_node_idx** | index of command node | below info not available for 520 |
| **bw in / bw out / bw weight** | input / output / weight bitwidth for this node | mode 1 / 2 / 3 |
| **MAC_cycle** | MAC engine runtime cycle number for this backend node. | |
| **MAC_runtime(ms)** | MAC engine runtime for this backend node. | |
| **RDMA_amount(Byte)** | RDMA amount for this backend node. | |
| **WDMA_amount(Byte)** | WDMA amount for this backend node. | |
| **Weight_amount(Byte)** | weight amount for this backend node. | |
| **runtime(ms)** | operator runtime. It's the total runtime including CFUNC, PFUNC, and SYNC. | |
| **CFUNC_runtime(ms)** | CFUNC runtime. | |
| **PFUNC_runtime(ms)** | PFUNC runtime. | |
| **SYNC_runtime(ms)** | SYNC runtime. | |
| **in_fmt / out_fmt** | input/output data formats. If only one input/output or multiple inputs/outputs with same format, the only format will be shown. If multiple formats for this node, then the details will be listed as “FORMAT1:IN1,IN2 \ FORMAT2:IN3”. | |

12 changes: 12 additions & 0 deletions docs/toolchain/appendix/history.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,18 @@

## Toolchain Change log

* **[v0.32.0]**
* Add Einsum defusion in kneronnxopt.
* Support Cast to int64 in knerex and compiler.
* Support HardSwish, Topk and Split nodes in knerex and compiler.
* Update the regression flow log printing. Print success log seperately from errors to avoid confusing.
* Update IP evaluator for DMA with small length.
* Fix the kneronnxopt bug in `replace_Gather_with_Slice`.
* Fix the knerex bug: node Concat channel mismatch.
* Fix the dynasty float bug in InstanceNorm pad edge mode.
* Fix knerex/compiler bug in CPU node settings for Resize node.
* Verified opset18 operator validity for knerex and compiler.
* Reduce memory usage (especially for large models) for compiler.
* **[v0.31.1]**
* Add `const_in_bitwidth_mode` option for quantization. The default is int16. Unless the customer particularly desires to increase the speed, it can be changed to int8
* Update analyzer exception log.
Expand Down
Loading
Loading