fix(trie): close prune race against concurrent commits via augment-walk#711
Conversation
`maybe_prune_trie` in sentrix-core spawns the prune on a background
`std::thread` that clones the trie. The clone's `self.version` and
`self.root` are frozen at clone time. The live-set walk in `prune()`
then iterates roots from `cutoff+1..=self.version` (the snapshot view).
While that walk runs — seconds on a healthy chain, minutes on a large
one — the main thread keeps applying blocks. Each commit goes through
`flush_trie_batch` and lands new entries in `TABLE_TRIE_VALUES` and
`TABLE_TRIE_NODES`. Their hashes are NOT in the prune's `live` set
because the walk built it from the snapshot.
When `gc_orphaned_nodes` runs at the end of `prune()`, it deletes every
key in those two tables that isn't in `live`. That includes the
just-committed entries from the racing block. Next boot,
`verify_integrity` reports "orphan value reference …" or "orphan node
reference …" depending on which table got hit — exactly what mainnet
saw on 2026-05-24:
* vps3 core: orphan-value `516a0dc27c…` at h=2215224 (chain.db hit
the race within milliseconds of finalising that block)
* fullnode-1: orphan-node `516a0dc27c…` at h=2131563
* fullnode-2: orphan-node `28fdc024ed1d…` at h=2125714
Different roles, different heights, same race class.
The fix: right before `gc_orphaned_nodes`, reload the highest version
currently in `TABLE_TRIE_ROOTS` and walk every root committed AFTER
`self.version`. Hashes reachable from those new roots get pinned in
`live` before the delete runs. The race window shrinks from "duration
of the full live-set walk" to "duration between the final root reload
and the gc commit" — at most one block's worth of writes.
A truly race-free fix requires running the live walk AND the delete
inside the same MDBX RW transaction. That's a bigger refactor — exposing
the txn handle through `TrieStorage` is a surface-area change that
belongs in its own PR. The B3b total_minted self-heal (separate PR) and
the boot-time `verify_integrity` orphan check together handle the
narrow remaining window as defense in depth.
Adds `TrieStorage::latest_version()` — one cursor walk over
`TABLE_TRIE_ROOTS` (bounded by `TRIE_KEEP_VERSIONS`, ~1000 entries),
milliseconds in practice.
Regression test `test_prune_augment_walk_preserves_newer_versions_committed_during_race`:
- v=1..=3 committed
- Open prune_view at v=3 (snapshot)
- Main commits v=4 with a brand-new key whose value_hash is unique to v=4
- prune_view.prune(0) runs
- Assert the v=4 value is still in TABLE_TRIE_VALUES (pre-fix: gone)
- Assert main.get(race_key) still resolves (pre-fix: trie integrity error)
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR fixes a concurrency bug in trie garbage collection by adding a new storage utility and enhancing the prune logic. A public Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
maybe_prune_triein sentrix-core spawns the prune on a backgroundstd::threadthat clones the trie. The clone'sself.versionandself.rootare frozen at clone time. The live-set walk inprune()then iterates roots fromcutoff+1..=self.version(the snapshot view).While that walk runs — seconds on a healthy chain, minutes on a large one — the main thread keeps applying blocks. Each commit goes through
flush_trie_batchand lands new entries inTABLE_TRIE_VALUESandTABLE_TRIE_NODES. Their hashes are NOT in the prune'sliveset because the walk built it from the snapshot.When
gc_orphaned_nodesruns at the end ofprune(), it deletes every key in those two tables that isn't inlive. That includes the just-committed entries from the racing block. Next boot,verify_integrityreports "orphan value reference …" or "orphan node reference …" depending on which table got hit — exactly what mainnet saw on 2026-05-24:516a0dc27c…at h=2215224 (chain.db hit the race within milliseconds of finalising that block)516a0dc27c…at h=213156328fdc024ed1d…at h=2125714Different roles, different heights, same race class.
The fix: right before
gc_orphaned_nodes, reload the highest version currently inTABLE_TRIE_ROOTSand walk every root committed AFTERself.version. Hashes reachable from those new roots get pinned inlivebefore the delete runs. The race window shrinks from "duration of the full live-set walk" to "duration between the final root reload and the gc commit" — at most one block's worth of writes.A truly race-free fix requires running the live walk AND the delete inside the same MDBX RW transaction. That's a bigger refactor — exposing the txn handle through
TrieStorageis a surface-area change that belongs in its own PR. The B3b total_minted self-heal (separate PR) and the boot-timeverify_integrityorphan check together handle the narrow remaining window as defense in depth.Adds
TrieStorage::latest_version()— one cursor walk overTABLE_TRIE_ROOTS(bounded byTRIE_KEEP_VERSIONS, ~1000 entries), milliseconds in practice.Regression test
test_prune_augment_walk_preserves_newer_versions_committed_during_race:Summary
Risk tier
Check ONE:
sentrix-core,sentrix-trie,sentrix-staking,sentrix-bft),block_executor,apply_block_*,state_rootpathRequired by tier
🟢 Low — minimum bar
🟡 Medium — adds
#[test]in same PR🟠 High — adds
🔴 Critical — adds
Test plan
Rollback plan
Related
Summary by CodeRabbit
Bug Fixes
Tests