kneron · MrWhoami · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026
diff --git a/docs/toolchain/appendix/app_flow_manual.md b/docs/toolchain/appendix/app_flow_manual.md
@@ -1,4 +1,4 @@
-# Kneron End to End Simulator v0.31.1
+# Kneron End to End Simulator v0.32.0
 
 This project allows users to perform image inference using Kneron's built in simulator. We encourage users to use simply use the kneron_inference function to perform the tests on your inputs.
 

diff --git a/docs/toolchain/appendix/fx_report.md b/docs/toolchain/appendix/fx_report.md
@@ -28,28 +28,29 @@ The summary will show the IP evaluator information. Below are some examples of r
 <p><span style="font-weight: bold;">Figure 4.</span> Summary for platform 730, mode 2 (with fixed-point model generated and snr check.) </p>
 </div>
 
-| **name**                | **explaination**                                                               | **availability**                 |
-|-------------------------|--------------------------------------------------------------------------------|----------------------------------|
-| **docker_version**      | the version of the toolchain docker for this report                            |                                  |
-| **comments**            | extra information                                                              |                                  |
-| **input bitwidth**      | customer set input bitwidth: int8 or int16                                     |                                  |
-| **output bitwidth**     | customer set output bitwidth: int8 or int16                                    |                                  |
-| **datapath bitwidth**   | customer set data bitwidth (or activation bitwidth): int8 or int16             |                                  |
-| **weight bitwidth**     | customer set weight bitwidth: int8 or int16 or int4. int4 only for certain HW. |                                  |
-| **fps**                 | estimated frame per second.                                                    |                                  |
-| **ITC**                 | estimated inference time.                                                      |                                  |
-| **RDMA bandwidth**      | set effective peak RDMA bandwidth based on HW                                  |                                  |
-| **WDMA bandwidth**      | set effective peak WDMA bandwidth based on HW                                  |                                  |
-| **GETW bandwidth**      | set effective peak weight loading bandwidth based on HW                        |                                  |
-| **RV**                  | Total data load (except weight load) from DDR in one inference                 |                                  |
-| **WV**                  | Total data write to DDR in one inference                                       |                                  |
-| **cpu node**            | CPU node in model will be listed here                                          | if any cpu node exists           |
-| **SNR(dB)**             | The snr of fix point model inferenced results.                                 | mode 2 and 3                     |
-| **btm_dynasty_path**    | path to inferenced results                                                     | mode 2 and 3                     |
-| **btm**                 | check the bit-true-match between dynasty and csim inference                    | mode 2 and 3                     |
-| **bie**                 | generated bie file (fix point model) for dynasty inference                     | mode 1/2/3                       |
-| **nef**                 | generated nef file (fix point model) for csim / dongle inference               | mode 1/2/3                       |
-| **gen fx model report** | file name of this report                                                       |                                  |
+| **name**                | **explaination**                                                               | **availability**       |
+| ----------------------- | ------------------------------------------------------------------------------ | ---------------------- |
+| **docker_version**      | the version of the toolchain docker for this report                            |                        |
+| **comments**            | extra information                                                              |                        |
+| **input bitwidth**      | customer set input bitwidth: int8 or int16                                     |                        |
+| **output bitwidth**     | customer set output bitwidth: int8 or int16                                    |                        |
+| **datapath bitwidth**   | customer set data bitwidth (or activation bitwidth): int8 or int16             |                        |
+| **weight bitwidth**     | customer set weight bitwidth: int8 or int16 or int4. int4 only for certain HW. |                        |
+| **fps**                 | estimated frame per second.                                                    |                        |
+| **ITC**                 | estimated inference time.                                                      |                        |
+| **RDMA bandwidth**      | set effective peak RDMA bandwidth based on HW                                  |                        |
+| **WDMA bandwidth**      | set effective peak WDMA bandwidth based on HW                                  |                        |
+| **GETW bandwidth**      | set effective peak weight loading bandwidth based on HW                        |                        |
+| **RV**                  | Total data load (except weight load) from DDR in one inference                 |                        |
+| **WV**                  | Total data write to DDR in one inference                                       |                        |
+| **cpu node**            | CPU node in model will be listed here                                          | if any cpu node exists |
+| **SNR(dB)**             | The snr of fix point model inferenced results.                                 | mode 2 and 3           |
+| **btm_dynasty_path**    | path to inferenced results                                                     | mode 2 and 3           |
+| **btm**                 | check the bit-true-match between dynasty and csim inference                    | mode 2 and 3           |
+| **bie**                 | generated bie file (fix point model) for dynasty inference                     | mode 1/2/3             |
+| **nef**                 | generated nef file (fix point model) for csim / dongle inference               | mode 1/2/3             |
+| **backend node graph**  | the graph after node fusion and decomposition, with backend node information.  |                        |
+| **gen fx model report** | file name of this report                                                       |                        |
 
 
 
@@ -75,20 +76,23 @@ The summary will show the IP evaluator information. Below are some examples of r
 <p><span style="font-weight: bold;">Figure 8.</span> Node details for platform 730, mode 2 (with fixed-point model generated and SNR check). </p>
 </div>
 
-| **column**                     | **explanation**                                                                                                                                                                                                                            | **availability**                                        |
-|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
-| **node**                       | model operation node name after node fusion and decomposition     |                                                         |
-| **SNR**                        | SNR score between fixed-point model and original model (per layer)                                                                                                                                                                            | every layer for mode 3 and only output layer for mode 2 |
-| **node origin**                | corresponding operation node name in original onnx before node fusion and decomposition  |                                                         |
-| **type**                       | NPU / FUSED / CPU                                                                                                                                                                                                                          |                                                         |
-| **node backend**               | corresponding backend node name                                                                                                                                                                                                            |                                                         |
-| **CMD_node_idx**               | index of command node                                                                                                                                                                                                                      | below info not available for 520                        |
-| **bw in / bw out / bw weight** | input / output / weight bitwidth for this node                                                                                                                                                                                             | mode 1 / 2 / 3                                          |
-| **MAC_cycle**                  | MAC engine runtime cycle number for this backend node.     |                                                         |
-| **MAC_runtime(ms)**            | MAC engine runtime for this backend node.                                                                  |                                                         |
-| **RDMA_amount(Byte)**          | RDMA amount for this backend node.                                                                                                                                                                                                         |                                                         |
-| **WDMA_amount(Byte)**          | WDMA amount for this backend node.                                                                                                                                                                                                         |                                                         |
-| **Weight_amount(Byte)**        | weight amount for this backend node.                                                                                                                                                                                                       |                                                         |
-| **runtime(ms)**                | operator runtime.                                                                                                                                                                                                                          |                                                         |
-| **in_fmt / out_fmt**           | input/output data formats. If only one input/output or multiple inputs/outputs with same format, the only format will be shown. If multiple formats for this node, then the details will be listed as “FORMAT1:IN1,IN2 \ FORMAT2:IN3”.     |                                                         |
+| **column**                     | **explanation**                                                                                                                                                                                                                        | **availability**                                        |
+| ------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
+| **node**                       | model operation node name after node fusion and decomposition                                                                                                                                                                          |                                                         |
+| **SNR**                        | SNR score between fixed-point model and original model (per layer)                                                                                                                                                                     | every layer for mode 3 and only output layer for mode 2 |
+| **node origin**                | corresponding operation node name in original onnx before node fusion and decomposition                                                                                                                                                |                                                         |
+| **type**                       | NPU / FUSED / CPU                                                                                                                                                                                                                      |                                                         |
+| **node backend**               | corresponding backend node name                                                                                                                                                                                                        |                                                         |
+| **CMD_node_idx**               | index of command node                                                                                                                                                                                                                  | below info not available for 520                        |
+| **bw in / bw out / bw weight** | input / output / weight bitwidth for this node                                                                                                                                                                                         | mode 1 / 2 / 3                                          |
+| **MAC_cycle**                  | MAC engine runtime cycle number for this backend node.                                                                                                                                                                                 |                                                         |
+| **MAC_runtime(ms)**            | MAC engine runtime for this backend node.                                                                                                                                                                                              |                                                         |
+| **RDMA_amount(Byte)**          | RDMA amount for this backend node.                                                                                                                                                                                                     |                                                         |
+| **WDMA_amount(Byte)**          | WDMA amount for this backend node.                                                                                                                                                                                                     |                                                         |
+| **Weight_amount(Byte)**        | weight amount for this backend node.                                                                                                                                                                                                   |                                                         |
+| **runtime(ms)**                | operator runtime. It's the total runtime including CFUNC, PFUNC, and SYNC.                                                                                                                                                             |                                                         |
+| **CFUNC_runtime(ms)**          | CFUNC runtime.                                                                                                                                                                                                                         |                                                         |
+| **PFUNC_runtime(ms)**          | PFUNC runtime.                                                                                                                                                                                                                         |                                                         |
+| **SYNC_runtime(ms)**           | SYNC runtime.                                                                                                                                                                                                                          |                                                         |
+| **in_fmt / out_fmt**           | input/output data formats. If only one input/output or multiple inputs/outputs with same format, the only format will be shown. If multiple formats for this node, then the details will be listed as “FORMAT1:IN1,IN2 \ FORMAT2:IN3”. |                                                         |
 
diff --git a/docs/toolchain/appendix/history.md b/docs/toolchain/appendix/history.md
@@ -24,6 +24,18 @@
 
 ## Toolchain Change log
 
+* **[v0.32.0]**
+    * Add Einsum defusion in kneronnxopt.
+    * Support Cast to int64 in knerex and compiler.
+    * Support HardSwish, Topk and Split nodes in knerex and compiler.
+    * Update the regression flow log printing. Print success log seperately from errors to avoid confusing.
+    * Update IP evaluator for DMA with small length.
+    * Fix the kneronnxopt bug in `replace_Gather_with_Slice`.
+    * Fix the knerex bug: node Concat channel mismatch.
+    * Fix the dynasty float bug in InstanceNorm pad edge mode.
+    * Fix knerex/compiler bug in CPU node settings for Resize node.
+    * Verified opset18 operator validity for knerex and compiler.
+    * Reduce memory usage (especially for large models) for compiler.
 * **[v0.31.1]**
     * Add `const_in_bitwidth_mode` option for quantization. The default is int16. Unless the customer particularly desires to increase the speed, it can be changed to int8
     * Update analyzer exception log.