`trie-db` 0.31.0 causes huge benchmark regression for storage-heavy   extrinsics

## The problem

We have observed a **huge** regression in benchmark execution time once bumping from `trie-db` `0.30.0` to `0.31.0` from benchmark whose setup does massive storage operations.

This is **critical** because we can't confidently benchmark some pallets for the upcoming `2.1.0` Polkadot / Kusama [release](https://github.com/polkadot-fellows/runtimes/pull/1065).


## A concrete example

Let's take a simple dummy extrinsic as example:

`force_apply_min_commission` ([here](https://github.com/paritytech/polkadot-sdk/blob/master/substrate/frame/staking-async/src/benchmarking.rs#L900-L900)) is a simple extrinsic doing 2read and 1 write so having ms as execution time doesn't make sense.

What is peculiar in the specific benchmark (and in many in staking-async pallet in the SDK) is that before benchmarking,  we do massive  storage deletion (e.g. on PolkadotAH we delete 27k nominators/ validators and all related storage items - so a lot). This happens in the benchmark setup phase and shouldn't therefore leak into benchmark results (see `Notes` at the end for a recent fix/workaround in frame-omni-bencher).

  `force_apply_min_commission` in `pallet_staking_async` takes ~46µs with `trie-db 0.30.0` but ~2.8ms with `trie-db 0.31.0`  while testing on my Ubuntu desktop. Results are aligned with what we see while running benchmark on fellowship CI (QEMU/native runner): look at the difference in results between  a `frame-omni-bencher` built with 0.30.0 and one with 0.31.0 here ->  https://github.com/polkadot-fellows/runtimes/pull/1065#issuecomment-3884597729


The regression affects any benchmark whose setup (so before actually measuring the extrinsic)  populates/deletes a large trie (e.g. > 25k validator/nominator entries) — the overhead seems to come from `commit_db()` / trie backend operations, not in the benchmarked extrinsic itself.

## Bisection summary
(results coming from my local desktop, we observed the same on CI)
  - `frame-omni-bencher` v0.15.0 / v0.16.0 (`trie-db 0.30.0`) → **~46 µs**
  - `frame-omni-bencher` v0.17.0+ (`trie-db 0.31.0`, PR paritytech/polkadot-sdk#10573) → **~3 ms**
  - Same runtime WASM, different `frame-omni-bencher` binary — confirms it's an issue with the binary
 
  **Root cause:** `trie-db` bump 0.30.0 → 0.31.0 (paritytech/trie#226 https://github.com/paritytech/trie/pull/226) merged in polkadot-sdk#10573 (2025-12-11) https://github.com/paritytech/polkadot-sdk/pull/10573 


## The relationship between frame-omni-bencher and trie-db

We are using `frame-omni-bencher` to benchmark pallet extrinsics. For staking-async pallet many benchmark setup implies bulk storage operation like the deletion of thousands of items from storage (e.g. 26k nominators/validators created at genesis).
Debatable or not, this is not the point now.


trie-db is not a direct dependency of frame-omni-bencher
AFAIK the chain is the following:
```bash
  frame-omni-bencher                                                            
    → frame-benchmarking-cli                                                
      → sc-executor (runs WASM benchmarks)                                      
        → sp-state-machine (storage overlay + trie backend)                 
          → sp-trie                                                             
            → trie-db 
```
During benchmarking, the host creates an in-memory trie backend (via sp-state-machine) to hold the genesis state. 
Every storage read/write during benchmark setup (like clear_validators_and_nominators() inserting/removing  dozen thousands of  entries) and commit_db() goes through sp-trie → trie-db on the host side.

The regression is in this host-side trie layer — the trie operations that materialize the storage overlay into the backend before the benchmark timing  starts. 
The benchmarked extrinsic itself is fast (1 read - 2 write in my example) ; it's the trie commit of the massive genesis state that got much slower with trie-db 0.31.0.
This shouldn't leak into benchmark results in any way - see https://github.com/paritytech/polkadot-sdk/issues/10798 and in particular this [comment](https://github.com/paritytech/polkadot-sdk/issues/10798#issuecomment-3753586377) from @cheme about writing I/O operation still on going after commit returns early.

## Reproduction steps

Checkout latest `main` from https://github.com/polkadot-fellows/runtimes.
Then from there:
```bash
# build polkadot asset hub
cargo build -p asset-hub-polkadot-runtime --profile production --features runtime-benchmarks
# install latest and greatest frame-omni-bencher
cargo install frame-omni-bencher
# test with frame-omni-bencher  the specific extrinsic 
frame-omni-bencher v1 benchmark pallet --runtime ./target/release/wbuild/asset-hub-polkadot-runtime/asset_hub_polkadot_runtime.compact.compressed.wasm --pallet pallet_staking_async --extrinsic "force_apply_min_commission" --steps 2 --repeat 1
## You will see execution time for this extrinsics around few ms

# Now install frame-omni-bencher v0.16.0 => the latest with trie-db 0.30.0
cargo install frame-omni-bencher --version 0.16.0 // if your clang/gcc are very recent, prefix with CXXFLAGS="-include cstdint"
# test with frame-omni-bencher 
frame-omni-bencher v1 benchmark pallet --runtime ./target/release/wbuild/asset-hub-polkadot-runtime/asset_hub_polkadot_runtime.compact.compressed.wasm --pallet pallet_staking_async --extrinsic "force_apply_min_commission" --steps 2 --repeat 1
## You will see execution time for this extrinsics around few microseconds!
```


## Notes
The issue happens on Polkadot and Kusama AssetHub in the fellowship runtime repo.  Interestingly enough, when running the same benchmark on Westend AssetHub within SDK using the same frame-omni-bencher, the results is in the `µs` ball-park and not ms.
To be noted that on Westend AssetHub (see https://github.com/paritytech/polkadot-sdk/issues/10798), we have observed initially execution time for this benchmark (massive storage deletion in setup + 1 read - 2 write in execution) around fw `ms` but that was "fixed" doing a dummy DB read/write in SDK PR https://github.com/paritytech/polkadot-sdk/pull/10802 and then (better) https://github.com/paritytech/polkadot-sdk/pull/10974 - this fix has brought down execution time to  `µs` on Westend AssetHub (Where we also do bulk deletion before executing the extrinsic in the benchmark) - but not on Polkadot  / Kusama AssetHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`trie-db` 0.31.0 causes huge benchmark regression for storage-heavy extrinsics #230

The problem

A concrete example

Bisection summary

The relationship between frame-omni-bencher and trie-db

Reproduction steps

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

trie-db 0.31.0 causes huge benchmark regression for storage-heavy extrinsics #230

Description

The problem

A concrete example

Bisection summary

The relationship between frame-omni-bencher and trie-db

Reproduction steps

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`trie-db` 0.31.0 causes huge benchmark regression for storage-heavy extrinsics #230