You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have observed a huge regression in benchmark execution time once bumping from trie-db0.30.0 to 0.31.0 from benchmark whose setup does massive storage operations.
This is critical because we can't confidently benchmark some pallets for the upcoming 2.1.0 Polkadot / Kusama release.
A concrete example
Let's take a simple dummy extrinsic as example:
force_apply_min_commission (here) is a simple extrinsic doing 2read and 1 write so having ms as execution time doesn't make sense.
What is peculiar in the specific benchmark (and in many in staking-async pallet in the SDK) is that before benchmarking, we do massive storage deletion (e.g. on PolkadotAH we delete 27k nominators/ validators and all related storage items - so a lot). This happens in the benchmark setup phase and shouldn't therefore leak into benchmark results (see Notes at the end for a recent fix/workaround in frame-omni-bencher).
force_apply_min_commission in pallet_staking_async takes ~46µs with trie-db 0.30.0 but ~2.8ms with trie-db 0.31.0 while testing on my Ubuntu desktop. Results are aligned with what we see while running benchmark on fellowship CI (QEMU/native runner): look at the difference in results between a frame-omni-bencher built with 0.30.0 and one with 0.31.0 here -> polkadot-fellows/runtimes#1065 (comment)
The regression affects any benchmark whose setup (so before actually measuring the extrinsic) populates/deletes a large trie (e.g. > 25k validator/nominator entries) — the overhead seems to come from commit_db() / trie backend operations, not in the benchmarked extrinsic itself.
Bisection summary
(results coming from my local desktop, we observed the same on CI)
The relationship between frame-omni-bencher and trie-db
We are using frame-omni-bencher to benchmark pallet extrinsics. For staking-async pallet many benchmark setup implies bulk storage operation like the deletion of thousands of items from storage (e.g. 26k nominators/validators created at genesis).
Debatable or not, this is not the point now.
trie-db is not a direct dependency of frame-omni-bencher
AFAIK the chain is the following:
During benchmarking, the host creates an in-memory trie backend (via sp-state-machine) to hold the genesis state.
Every storage read/write during benchmark setup (like clear_validators_and_nominators() inserting/removing dozen thousands of entries) and commit_db() goes through sp-trie → trie-db on the host side.
The regression is in this host-side trie layer — the trie operations that materialize the storage overlay into the backend before the benchmark timing starts.
The benchmarked extrinsic itself is fast (1 read - 2 write in my example) ; it's the trie commit of the massive genesis state that got much slower with trie-db 0.31.0.
This shouldn't leak into benchmark results in any way - see paritytech/polkadot-sdk#10798 and in particular this comment from @cheme about writing I/O operation still on going after commit returns early.
# build polkadot asset hub
cargo build -p asset-hub-polkadot-runtime --profile production --features runtime-benchmarks
# install latest and greatest frame-omni-bencher
cargo install frame-omni-bencher
# test with frame-omni-bencher the specific extrinsic
frame-omni-bencher v1 benchmark pallet --runtime ./target/release/wbuild/asset-hub-polkadot-runtime/asset_hub_polkadot_runtime.compact.compressed.wasm --pallet pallet_staking_async --extrinsic "force_apply_min_commission" --steps 2 --repeat 1
## You will see execution time for this extrinsics around few ms# Now install frame-omni-bencher v0.16.0 => the latest with trie-db 0.30.0
cargo install frame-omni-bencher --version 0.16.0 // if your clang/gcc are very recent, prefix with CXXFLAGS="-include cstdint"# test with frame-omni-bencher
frame-omni-bencher v1 benchmark pallet --runtime ./target/release/wbuild/asset-hub-polkadot-runtime/asset_hub_polkadot_runtime.compact.compressed.wasm --pallet pallet_staking_async --extrinsic "force_apply_min_commission" --steps 2 --repeat 1
## You will see execution time for this extrinsics around few microseconds!
Notes
The issue happens on Polkadot and Kusama AssetHub in the fellowship runtime repo. Interestingly enough, when running the same benchmark on Westend AssetHub within SDK using the same frame-omni-bencher, the results is in the µs ball-park and not ms.
To be noted that on Westend AssetHub (see paritytech/polkadot-sdk#10798), we have observed initially execution time for this benchmark (massive storage deletion in setup + 1 read - 2 write in execution) around fw ms but that was "fixed" doing a dummy DB read/write in SDK PR paritytech/polkadot-sdk#10802 and then (better) paritytech/polkadot-sdk#10974 - this fix has brought down execution time to µs on Westend AssetHub (Where we also do bulk deletion before executing the extrinsic in the benchmark) - but not on Polkadot / Kusama AssetHub.
The problem
We have observed a huge regression in benchmark execution time once bumping from
trie-db0.30.0to0.31.0from benchmark whose setup does massive storage operations.This is critical because we can't confidently benchmark some pallets for the upcoming
2.1.0Polkadot / Kusama release.A concrete example
Let's take a simple dummy extrinsic as example:
force_apply_min_commission(here) is a simple extrinsic doing 2read and 1 write so having ms as execution time doesn't make sense.What is peculiar in the specific benchmark (and in many in staking-async pallet in the SDK) is that before benchmarking, we do massive storage deletion (e.g. on PolkadotAH we delete 27k nominators/ validators and all related storage items - so a lot). This happens in the benchmark setup phase and shouldn't therefore leak into benchmark results (see
Notesat the end for a recent fix/workaround in frame-omni-bencher).force_apply_min_commissioninpallet_staking_asynctakes ~46µs withtrie-db 0.30.0but ~2.8ms withtrie-db 0.31.0while testing on my Ubuntu desktop. Results are aligned with what we see while running benchmark on fellowship CI (QEMU/native runner): look at the difference in results between aframe-omni-bencherbuilt with 0.30.0 and one with 0.31.0 here -> polkadot-fellows/runtimes#1065 (comment)The regression affects any benchmark whose setup (so before actually measuring the extrinsic) populates/deletes a large trie (e.g. > 25k validator/nominator entries) — the overhead seems to come from
commit_db()/ trie backend operations, not in the benchmarked extrinsic itself.Bisection summary
(results coming from my local desktop, we observed the same on CI)
frame-omni-bencherv0.15.0 / v0.16.0 (trie-db 0.30.0) → ~46 µsframe-omni-bencherv0.17.0+ (trie-db 0.31.0, PR bump trie-db version to 0.31.0 polkadot-sdk#10573) → ~3 msframe-omni-bencherbinary — confirms it's an issue with the binaryRoot cause:
trie-dbbump 0.30.0 → 0.31.0 (#226 #226) merged in polkadot-sdk#10573 (2025-12-11) paritytech/polkadot-sdk#10573The relationship between frame-omni-bencher and trie-db
We are using
frame-omni-bencherto benchmark pallet extrinsics. For staking-async pallet many benchmark setup implies bulk storage operation like the deletion of thousands of items from storage (e.g. 26k nominators/validators created at genesis).Debatable or not, this is not the point now.
trie-db is not a direct dependency of frame-omni-bencher
AFAIK the chain is the following:
frame-omni-bencher → frame-benchmarking-cli → sc-executor (runs WASM benchmarks) → sp-state-machine (storage overlay + trie backend) → sp-trie → trie-dbDuring benchmarking, the host creates an in-memory trie backend (via sp-state-machine) to hold the genesis state.
Every storage read/write during benchmark setup (like clear_validators_and_nominators() inserting/removing dozen thousands of entries) and commit_db() goes through sp-trie → trie-db on the host side.
The regression is in this host-side trie layer — the trie operations that materialize the storage overlay into the backend before the benchmark timing starts.
The benchmarked extrinsic itself is fast (1 read - 2 write in my example) ; it's the trie commit of the massive genesis state that got much slower with trie-db 0.31.0.
This shouldn't leak into benchmark results in any way - see paritytech/polkadot-sdk#10798 and in particular this comment from @cheme about writing I/O operation still on going after commit returns early.
Reproduction steps
Checkout latest
mainfrom https://github.com/polkadot-fellows/runtimes.Then from there:
Notes
The issue happens on Polkadot and Kusama AssetHub in the fellowship runtime repo. Interestingly enough, when running the same benchmark on Westend AssetHub within SDK using the same frame-omni-bencher, the results is in the
µsball-park and not ms.To be noted that on Westend AssetHub (see paritytech/polkadot-sdk#10798), we have observed initially execution time for this benchmark (massive storage deletion in setup + 1 read - 2 write in execution) around fw
msbut that was "fixed" doing a dummy DB read/write in SDK PR paritytech/polkadot-sdk#10802 and then (better) paritytech/polkadot-sdk#10974 - this fix has brought down execution time toµson Westend AssetHub (Where we also do bulk deletion before executing the extrinsic in the benchmark) - but not on Polkadot / Kusama AssetHub.