Improve dissolve network process#2412
Conversation
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex is a global vector, and retain() scans and rewrites the whole vector while this path only reserves one write. A large global timelock index can make dissolve cleanup exceed the on_idle budget and stall or overweight a block. This needs the same bounded/resumable cleanup model as the prefix maps, or a storage layout that can clear by netuid under a meter.
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| if weight.all_lt(limit) { | ||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
weight.all_lt(limit) only proves the dissolve cleanup did not consume the full limit; it does not reserve enough remaining weight for process_network_registration_queue(). That call can execute set_new_network_state() and only add the actual weight after the fact, so a block with a tiny amount of remaining weight can still perform a full network registration outside the on_idle budget. Pass a remaining WeightMeter into the registration queue or require a worst-case fit before executing it.
|
|
||
| ( | ||
| read_all, | ||
| last_hot.map(|hot| TotalHotkeyAlpha::<T>::hashed_key_for(&hot, netuid)), |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
The checkpoint stores only the outer TotalHotkeyAlpha key. If a single hotkey has more coldkey alpha entries than fit in one on_idle budget, the inner alpha_iter_single_prefix(&hot) scan exits before completion but cannot resume from the last coldkey. The next block restarts the same hotkey from the beginning, so cleanup can make no progress for that subnet. Persist an inner coldkey cursor, or bound one hotkey's inner work so it is guaranteed to fit.
| } | ||
|
|
||
| let leftover: u128 = pot_u128.saturating_sub(distributed); | ||
| let leftover: u128 = total_rem |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
Largest-remainder distribution is computed over only the current stakers batch, while total_alpha_value_u128 and pot_u64 are global. When settlement spans multiple on_idle calls, each batch gets its own leftover = total_rem / total_alpha_value_u128, so the extra units are not assigned by global remainder order and can over-credit earlier batches or under-credit later stakers. The remainder pass needs to be global across all stakers, or the algorithm must persist enough state to distribute leftovers once after the full scan.
| [patch.crates-io] | ||
| w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" } | ||
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] Lockfile switches core2 to a git source
This PR changes core2 from a crates.io checksum-pinned package to a GitHub git source. Even though the rev is pinned, it bypasses the crates.io checksum trust path and is unrelated to the dissolve-network runtime cleanup. Keep the crates.io source if possible, vendor the exact source through the repo's approved dependency process, or split this supply-chain change into a separate explicitly reviewed PR.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This phase charges a coarse reads_writes(20, 12) upfront and then performs conditional mint/spend/transfer/recycle/storage operations whose real cost depends on branch outcomes. In runtime cleanup code, undercharging on any path can make on_idle exceed its advertised budget. Meter each conditional balance/storage operation as it is performed, or reserve a defensible worst-case that covers every branch before mutating state.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex is a single StorageValue<BTreeSet<...>>, so this mutate(... retain ...) decodes, scans, rewrites, and re-encodes the entire global index. Charging only writes(1) does not bound the work by the number of timelocked commitments. A dissolved subnet with many timelocked entries can make the on-idle cleanup exceed the block's remaining weight and risk block production/DoS. This needs a bounded/indexed cleanup path or a separate resumable cursor over the timelocked index.
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| if weight.all_lt(limit) { | ||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
process_network_registration_queue() is called whenever cleanup returns any weight below limit, but the remaining weight is not passed in and the queue processor does not refuse work that cannot fit. It can then call set_new_network_state and return a weight larger than the on-idle budget. This reintroduces the same class of overweight runtime work the dissolve cleanup is trying to avoid. Pass a WeightMeter or require a worst-case fit check before processing a queued registration.
| } | ||
| } | ||
|
|
||
| status.subnet_total_alpha_value = Some(total_alpha_value_u128); |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
This persists subnet_total_alpha_value even when the inner alpha_iter_single_prefix(&hot) loop stopped partway through a hotkey, but the checkpoint only records an outer TotalHotkeyAlpha key. On the next block there is no coldkey-level cursor telling the cleanup which (hot, cold) entries were already counted. Depending on iter_from positioning, the partial hotkey's coldkeys are either counted again or skipped, corrupting the total alpha denominator used for payouts. Store a nested coldkey checkpoint or only persist the aggregate after a hotkey is fully processed.
| } | ||
|
|
||
| let leftover: u128 = pot_u128.saturating_sub(distributed); | ||
| let leftover: u128 = total_rem |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
Largest-remainder distribution is now computed only over the stakers collected in this on-idle batch, while total_alpha_value_u128 is global for the subnet. Because later batches are not sorted together with earlier batches, the extra RAO from rounding goes to the largest remainders in each batch instead of the largest remainders globally. Batch boundaries are weight-dependent, so stakers can be underpaid or overpaid relative to the deterministic whole-subnet formula. Keep global remainder state or defer remainder assignment until all stakers are known.
| [patch.crates-io] | ||
| w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" } | ||
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] Lockfile switches core2 to a git source
This patches core2 from crates.io to a GitHub revision for the whole workspace. Even with a pinned rev, it moves a dependency from the registry checksum model to a git source and is unrelated to the dissolve-network runtime change. This is a supply-chain expansion that should either be split into a dedicated dependency PR with review/justification or replaced by a vendored/audited source path.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
The final destroy phase consumes a coarse reads_writes(20, 12) allowance up front and then executes conditional balance mint/spend/transfer, recycling, and storage updates without metering each operation. If the actual branches exceed this estimate, on-idle can perform more work than the recorded weight; if the estimate is too high, cleanup may stall unnecessarily. Meter the same way as the earlier phases: check and consume before each conditional balance/storage operation or use benchmarked worst-case weight that covers every branch actually executed here.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex::<T>::mutate(... retain ...) still scans and rewrites the entire global index while charging only one write. A large timelocked index makes dissolve cleanup exceed the on_idle weight budget, so network cleanup can become a block-production DoS path. This needs bounded/chunked cleanup or a storage layout keyed by netuid that can be cleared with the same meter discipline as the other prefix maps.
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| if weight.all_lt(limit) { | ||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
process_network_registration_queue() is called after dissolve cleanup whenever weight.all_lt(limit), but it receives no remaining-weight meter and no worst-case fit check. A block can spend nearly all idle weight on cleanup and then run a full set_new_network_state registration anyway, exceeding the hook's advertised budget. Gate this on the exact remaining weight and pass a meter through the registration path, or defer queue processing to a separately weighted call.
| // reserve the weight for the add_balance_to_coldkey_account function call later | ||
| if !weight_meter.can_consume(need_to_consume_weight) { | ||
| inner_read_all = false; | ||
| last_hot = Some(hot.clone()); |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When the inner coldkey scan runs out of weight, the code checkpoints only the outer hotkey (last_hot = Some(hot.clone())) and resumes with TotalHotkeyAlpha::iter_from(hashed_key_for(hot, netuid)). iter_from resumes after that hotkey, so the partially scanned hotkey is skipped on the next pass. That undercounts subnet_total_alpha_value, skips payouts for boundary stakers, and leaves alpha entries behind. Persist an inner coldkey checkpoint or only checkpoint a hotkey after all of its coldkeys are fully processed.
| } | ||
|
|
||
| let leftover: u128 = pot_u128.saturating_sub(distributed); | ||
| let leftover: u128 = total_rem |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
Largest-remainder distribution is computed only over the stakers collected in this weight-limited batch. Across multiple on_idle batches, each batch sorts only its local remainders and gives out its local leftover, so the final TAO distribution is not the global pro-rata largest-remainder result. An attacker can benefit or lose based on storage ordering and batch boundaries. Persist all remainders for a final global pass, or use a deterministic accumulator that is independent of batch boundaries.
| [patch.crates-io] | ||
| w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" } | ||
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] Lockfile switches core2 to a git source
This PR changes core2 from a crates.io checksum-pinned package to a GitHub git source. The rev is pinned, but this is still a supply-chain expansion unrelated to dissolve-network cleanup and removes the registry checksum path for a transitive dependency. Keep the crates.io version if possible, vendor the exact source through an established repository-controlled process, or split this into a separate dependency-remediation PR with justification.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
The finalization path consumes a coarse reads_writes(20, 12) budget, then performs conditional mint/spend/transfer/recycle/storage operations whose actual paths depend on balances and refund state. In runtime cleanup code this should be metered at the operation boundary, not via a broad allowance that can drift from real DB costs. Use the same incremental can_consume/consume pattern around each conditional balance/storage operation, or prove and encode a generated benchmarked bound for this exact path.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex::<T>::mutate(... retain ...) scans and rewrites the entire global index while charging only writes(1). A large global timelock index can make dissolved-network cleanup exceed the on_idle budget and stall or overweight the block. This needs bounded, resumable cleanup or storage shaped so cleanup is prefix-clearable by netuid.
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| if weight.all_lt(limit) { | ||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
process_network_registration_queue() is called after dissolve cleanup whenever weight.all_lt(limit), but it receives no remaining-weight meter and no worst-case fit check. The hook can consume the full dissolve budget and then execute a registration path on top, so the returned on_idle weight can exceed the limit. Gate this call on a bounded remaining budget or make the registration path use the same meter before it mutates state.
| // reserve the weight for the add_balance_to_coldkey_account function call later | ||
| if !weight_meter.can_consume(need_to_consume_weight) { | ||
| inner_read_all = false; | ||
| last_hot = Some(hot.clone()); |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When the inner coldkey scan runs out of weight, the checkpoint records only last_hot = Some(hot.clone()). On the next pass, iteration resumes after that hotkey and skips any remaining coldkeys under the same hotkey, leaving stake/value entries undistributed or uncleared. The checkpoint must include the inner coldkey position, or the code must avoid advancing the hotkey checkpoint until every coldkey for that hotkey has been processed.
| } | ||
|
|
||
| let leftover: u128 = pot_u128.saturating_sub(distributed); | ||
| let leftover: u128 = total_rem |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
The largest-remainder distribution is computed only over the stakers collected in the current weight-limited batch. If settlement spans multiple idle passes, each batch allocates leftovers independently instead of ranking remainders globally across all stakers, so TAO distribution depends on iteration boundaries and weight availability. Persist enough state to perform a global remainder pass, or defer all payouts until the full staker set has been accumulated.
| [patch.crates-io] | ||
| w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" } | ||
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] Lockfile switches core2 to a git source
This patch redirects core2 from crates.io to a GitHub revision. Even with a pinned rev, this is a supply-chain change unrelated to dissolve-network cleanup and it removes the normal crates.io checksum path. Keep the published crate source in the lockfile, vendor the exact crate source, or split this into a separately reviewed dependency PR with explicit provenance.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This consumes a coarse reads_writes(20, 12) allowance up front and then executes branch-dependent balance transfers, mint/spend issuance adjustments, final recycling, and cleanup writes without per-operation metering. If the estimate is wrong or future branches add work, the dissolve cleanup can run overweight while still reporting the precharged value. Meter the actual operations or use a proven worst-case bound tied to the called helpers' weights.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex is a global index, but this cleanup charges only one write before retain scans and rewrites the whole vector. A large timelock index can make dissolve cleanup exceed the idle weight budget even after the prefix clears were metered. Make this index netuid-keyed/prefix-clearable, or prune it incrementally with its own cursor and per-entry weight accounting.
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| if weight.all_lt(limit) { | ||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
process_network_registration_queue() is executed whenever dissolve cleanup returns less than limit, but it receives no remaining-weight meter and performs set_new_network_state before accruing the actual weight. If only a small amount of idle budget remains, a queued registration can still execute and push the block over its idle limit. Gate this on the remaining weight or make registration processing consume from the same WeightMeter before doing state changes.
| // reserve the weight for the add_balance_to_coldkey_account function call later | ||
| if !weight_meter.can_consume(need_to_consume_weight) { | ||
| inner_read_all = false; | ||
| last_hot = Some(hot.clone()); |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When the inner coldkey scan for a hotkey runs out of weight, the function records only the outer hotkey as last_hot. The next pass resumes the TotalHotkeyAlpha iterator at that same hotkey and restarts alpha_iter_single_prefix(&hot) from the beginning, so a hotkey with more coldkeys than fit in one idle slice can repeatedly discard partial progress and never settle. Persist a nested coldkey checkpoint, or preflight the full hotkey cost before consuming any of its inner work.
| @@ -568,31 +842,18 @@ impl<T: Config> Pallet<T> { | |||
| distributed = distributed.saturating_add(u128::from(share_u64)); | |||
|
|
|||
| let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default(); | |||
| total_rem = total_rem.saturating_add(rem); | |||
| portions.push(Portion { | |||
| _hot: hot.clone(), | |||
| cold: cold.clone(), | |||
| is_protocol: false, | |||
| share: share_u64, | |||
| rem, | |||
| }); | |||
| } | |||
|
|
|||
| if protocol_alpha_value_u128 > 0 { | |||
| let prod: u128 = pot_u128.saturating_mul(protocol_alpha_value_u128); | |||
| let share_u128: u128 = prod.checked_div(total_alpha_value_u128).unwrap_or_default(); | |||
| let share_u64: u64 = share_u128.min(u128::from(u64::MAX)) as u64; | |||
| distributed = distributed.saturating_add(u128::from(share_u64)); | |||
| let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default(); | |||
| portions.push(Portion { | |||
| _hot: owner_coldkey.clone(), | |||
| cold: owner_coldkey.clone(), | |||
| is_protocol: true, | |||
| share: share_u64, | |||
| rem, | |||
| }); | |||
| } | |||
|
|
|||
| let leftover: u128 = pot_u128.saturating_sub(distributed); | |||
| let leftover: u128 = total_rem | |||
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
Settlement can run in multiple idle batches, but each batch computes largest-remainder bonuses only over the local stakers vector while using the global total_alpha_value_u128. That gives each batch its own rounding competition instead of ranking remainders globally across all stakers, so payout distribution depends on batching and iterator order. Persist global remainder candidates or separate floor-share payment from a final global largest-remainder pass.
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] Lockfile switches core2 to a git source
This patches core2 from crates.io to a GitHub revision for the whole workspace. The rev is pinned, but this still changes the supply-chain trust source and drops the registry checksum for a dependency unrelated to dissolve-network runtime cleanup. Avoid mixing this into the runtime PR, or vendor/verify the replacement source through the repository's normal dependency-review process.
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This phase reserves a coarse reads_writes(20, 12) budget and then performs conditional balance crediting, transfer, recycling, storage removals, and event-relevant state changes without metering each operation. In on_idle, underestimating this path can exceed the remaining block weight. Split these operations into metered steps or charge a benchmarked worst-case WeightInfo that includes the currency and storage paths actually executed.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex is a single global Vec, so mutate(... retain ...) decodes, scans, filters, and rewrites the entire vector regardless of how many entries it contains. The cleanup charges only writes(1) before doing that work, so a large timelocked index can make on_idle exceed its declared weight and threaten block production. This needs to be converted to bounded/resumable storage cleanup, or the index must be modeled so per-netuid entries can be removed with metered prefix iteration.
| fn on_idle(_block: BlockNumberFor<T>, limit: Weight) -> Weight { | ||
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| if weight.all_lt(limit) { | ||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
process_network_registration_queue() is invoked after dissolve cleanup whenever the consumed weight is less than the original limit, but it receives no remaining-weight meter and there is no check that the full registration worst case fits in the leftover budget. set_new_network_state performs multiple storage scans and writes before returning its actual weight, so on_idle can do substantially more work than the block's remaining idle budget. Gate this path with an explicit remaining-weight check/meter or defer it to a normal extrinsic/transactional queue with bounded per-block work.
| // reserve the weight for the add_balance_to_coldkey_account function call later | ||
| if !weight_meter.can_consume(need_to_consume_weight) { | ||
| inner_read_all = false; | ||
| last_hot = Some(hot.clone()); |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When this nested coldkey scan runs out of weight, the persisted checkpoint is only the outer TotalHotkeyAlpha hotkey. Any partially processed coldkeys under that hotkey are either replayed or skipped on the next iter_from batch, depending on which branch set last_hot. That makes the async dissolve settlement non-deterministic under realistic weight limits: totals can be double-counted/undercounted, stakers can miss payouts, and cleanup can leave orphaned alpha entries. Persist a checkpoint that includes the inner coldkey position, or make each batch process an entire hotkey only after first proving the full inner scan and payout work fits.
| @@ -568,31 +842,18 @@ impl<T: Config> Pallet<T> { | |||
| distributed = distributed.saturating_add(u128::from(share_u64)); | |||
|
|
|||
| let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default(); | |||
| total_rem = total_rem.saturating_add(rem); | |||
| portions.push(Portion { | |||
| _hot: hot.clone(), | |||
| cold: cold.clone(), | |||
| is_protocol: false, | |||
| share: share_u64, | |||
| rem, | |||
| }); | |||
| } | |||
|
|
|||
| if protocol_alpha_value_u128 > 0 { | |||
| let prod: u128 = pot_u128.saturating_mul(protocol_alpha_value_u128); | |||
| let share_u128: u128 = prod.checked_div(total_alpha_value_u128).unwrap_or_default(); | |||
| let share_u64: u64 = share_u128.min(u128::from(u64::MAX)) as u64; | |||
| distributed = distributed.saturating_add(u128::from(share_u64)); | |||
| let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default(); | |||
| portions.push(Portion { | |||
| _hot: owner_coldkey.clone(), | |||
| cold: owner_coldkey.clone(), | |||
| is_protocol: true, | |||
| share: share_u64, | |||
| rem, | |||
| }); | |||
| } | |||
|
|
|||
| let leftover: u128 = pot_u128.saturating_sub(distributed); | |||
| let leftover: u128 = total_rem | |||
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
Largest-remainder distribution is computed only over the stakers collected in the current weight-limited batch, while total_alpha_value_u128 is global for the whole subnet. Each batch calculates leftover from its local total_rem and then advances status.subnet_distributed_tao, so the final payout depends on batch boundaries rather than the global ordering of all staker remainders. That can misallocate dissolved subnet TAO when the settlement spans multiple on_idle calls. Persist all candidate remainders until a global pass is possible, or use a deterministic streaming allocation that is mathematically equivalent across batches.
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] Lockfile switches core2 to a git source
This runtime PR changes core2 from a crates.io package with a checksum to a GitHub patch revision. Even pinned git dependencies remove the registry checksum/release provenance and add a supply-chain dependency unrelated to dissolve-network cleanup. If the yank must be handled here, vendor the exact crate source or use an internal audited fork with repository ownership and review rationale; otherwise keep this dependency change out of the runtime PR.
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This phase reserves a coarse reads_writes(20, 12) allowance, then conditionally performs multiple balance operations, storage removals, account lookups, and recycling writes. Because the work is not charged at each branch, changes to any called helper can silently invalidate the bound and make dissolve cleanup exceed the on_idle budget. Runtime cleanup code should meter each concrete operation, or use a benchmarked worst-case bound that is tied to the exact helper behavior and enforced before entering the phase.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex::<T>::mutate(... retain ...) scans and rewrites the entire global timelock index, but the meter only reserves one write. If the index is large, dissolve cleanup can consume far more than the on_idle budget and produce an overweight block. Make this index cleanup chunked/resumable, or store it by netuid so clear_prefix_with_meter can meter progress like the other commitment maps.
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| if weight.all_lt(limit) { | ||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
process_network_registration_queue() runs whenever dissolve cleanup reports any weight below limit, but it does not receive a remaining-weight meter or check that its worst-case work fits. set_new_network_state can perform substantial runtime work, so this can exceed the block's on_idle limit after cleanup has already consumed most of the budget. Pass the remaining budget into the queue processor and skip/defer registration unless the full bounded cost fits.
| // reserve the weight for the add_balance_to_coldkey_account function call later | ||
| if !weight_meter.can_consume(need_to_consume_weight) { | ||
| inner_read_all = false; | ||
| last_hot = Some(hot.clone()); |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When the inner coldkey scan runs out of weight, the checkpoint records only the outer hotkey key. The next call resumes with TotalHotkeyAlpha::<T>::iter_from(last_hot_key), which restarts that hotkey from the first coldkey, repeats already-charged work, and can fail forever for a hotkey with many coldkeys. Persist an inner coldkey checkpoint, or require enough budget to finish all coldkeys for a hotkey before starting it.
| } | ||
|
|
||
| let leftover: u128 = pot_u128.saturating_sub(distributed); | ||
| let leftover: u128 = total_rem |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
The largest-remainder calculation is performed only over the current stakers batch, while status.subnet_distributed_tao persists across batches. Splitting settlement across on_idle calls changes who receives the remainder atoms compared with a single global pass, so payout fairness depends on block weight and iteration batching. Track remainders globally across all stakers before awarding leftovers, or defer the remainder pass until every staker has been scanned.
| [patch.crates-io] | ||
| w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" } | ||
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] Lockfile switches core2 to a git source
This patches core2 from crates.io to a GitHub revision, and Cargo.lock now resolves it from that git source. Even pinned git dependencies are a supply-chain expansion that executes in every build, and this change is unrelated to the dissolve-network runtime logic. Keep the registry crate if possible, vendor the exact source through the repository's normal process, or split this into a dedicated dependency PR with explicit review.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This function reserves a coarse reads_writes(20, 12) budget up front, then executes conditional balance transfers, shortfall crediting, recycling, storage removals, and final status updates regardless of the actual path taken. For runtime cleanup driven by on_idle, the weight meter should charge each storage and balance operation at the point it is performed, or the cleanup can silently exceed the declared budget on paths with refunds and recycling.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex::<T>::mutate(... retain ...) scans and rewrites the entire global timelocked index while charging only writes(1). A large index can make this dissolve-cleanup phase exceed the on_idle weight budget and stall or overweight the block. This needs a bounded/resumable cleanup path keyed by netuid, or the index must be restructured so entries can be removed with metered prefix iteration.
| if weight.all_lt(limit) { | ||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
This starts process_network_registration_queue() whenever the dissolve cleanup consumed less than limit, but it does not pass a remaining-weight meter or check that the registration's worst-case weight fits in the remaining budget. If cleanup leaves only a small remainder, set_new_network_state can still execute and make on_idle return/consume more than the block allows. Gate registration on a real remaining-weight budget or move it to its own metered phase.
| // reserve the weight for the add_balance_to_coldkey_account function call later | ||
| if !weight_meter.can_consume(need_to_consume_weight) { | ||
| inner_read_all = false; | ||
| last_hot = Some(hot.clone()); |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When the inner coldkey scan runs out of weight, the checkpoint records only the outer hotkey. A hotkey with more coldkeys than fit in one idle budget will restart at the same hotkey without any coldkey cursor, so this phase can repeat the same prefix and never progress. Persist a checkpoint for the inner Alpha prefix, or make the phase process a bounded coldkey cursor directly.
| @@ -568,31 +842,18 @@ impl<T: Config> Pallet<T> { | |||
| distributed = distributed.saturating_add(u128::from(share_u64)); | |||
|
|
|||
| let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default(); | |||
| total_rem = total_rem.saturating_add(rem); | |||
| portions.push(Portion { | |||
| _hot: hot.clone(), | |||
| cold: cold.clone(), | |||
| is_protocol: false, | |||
| share: share_u64, | |||
| rem, | |||
| }); | |||
| } | |||
|
|
|||
| if protocol_alpha_value_u128 > 0 { | |||
| let prod: u128 = pot_u128.saturating_mul(protocol_alpha_value_u128); | |||
| let share_u128: u128 = prod.checked_div(total_alpha_value_u128).unwrap_or_default(); | |||
| let share_u64: u64 = share_u128.min(u128::from(u64::MAX)) as u64; | |||
| distributed = distributed.saturating_add(u128::from(share_u64)); | |||
| let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default(); | |||
| portions.push(Portion { | |||
| _hot: owner_coldkey.clone(), | |||
| cold: owner_coldkey.clone(), | |||
| is_protocol: true, | |||
| share: share_u64, | |||
| rem, | |||
| }); | |||
| } | |||
|
|
|||
| let leftover: u128 = pot_u128.saturating_sub(distributed); | |||
| let leftover: u128 = total_rem | |||
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
The pro-rata payout is computed over only the stakers collected in this idle batch, while total_alpha_value_u128 is global for the subnet. Largest-remainder ordering is therefore local to each batch, and undistributed pot can later be swept into the owner refund/recycle path instead of going to the globally largest remainders. Settlement needs either a global remainder pass or persisted per-staker remainder state before finalizing the pot.
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] Lockfile switches core2 to a git source
This patches core2 from crates.io to a GitHub revision for all workspace builds. Even with a pinned rev, this is a supply-chain change unrelated to dissolve-network cleanup and moves a transitive dependency off the registry checksum path. Keep the registry source, vendor the exact source with review, or split this into a dedicated dependency PR with explicit vetting.
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This finalization phase consumes a coarse reads_writes(20, 12) allowance before conditional mint/spend/transfer/recycle/storage operations. Those paths include balance mutations and optional storage writes whose actual cost depends on account/state conditions, so the cleanup can under-account block weight. Meter each conditional operation as it is selected, or prove the upfront bound covers the worst case including all token/account side effects.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex::<T>::mutate(... retain ...) decodes, scans, and rewrites the entire global index while charging only one write. Because this runs inside dissolve cleanup under the on-idle meter, a large timelock index can exceed the available block weight and stall or overweight cleanup. Store this index by netuid, or make the global cleanup resumable and charge per decoded/removed entry.
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| if weight.all_lt(limit) { | ||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
This calls process_network_registration_queue() whenever dissolve cleanup consumed less than limit, but it does not pass the remaining budget or check that a queued set_new_network_state fits before executing it. If cleanup leaves only a small remainder, on-idle can still perform a full network registration and only add the actual weight afterward. Gate this helper with a remaining WeightMeter or a conservative worst-case fit check before doing the registration.
| // reserve the weight for the add_balance_to_coldkey_account function call later | ||
| if !weight_meter.can_consume(need_to_consume_weight) { | ||
| inner_read_all = false; | ||
| last_hot = Some(hot.clone()); |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When weight runs out inside alpha_iter_single_prefix(&hot), the checkpoint records only the outer hotkey. Cleanup then has to process all coldkeys for that hotkey in one future idle slice. A hotkey with enough coldkeys can therefore keep this phase from making progress, leaving dissolved-network cleanup stuck. Persist the inner coldkey key as part of the checkpoint, or split the storage so each inner scan is independently resumable.
| } | ||
|
|
||
| let leftover: u128 = pot_u128.saturating_sub(distributed); | ||
| let leftover: u128 = total_rem |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
leftover and the largest-remainder sort are computed only over the current metered batch of stakers, while total_alpha_value_u128 is global and subnet_distributed_tao is carried across batches. This changes who receives rounding units based on cleanup chunking rather than the global ordering of remainders, and can misallocate or leave TAO to be handled by the final refund/recycle path. Compute the largest-remainder allocation over a stable global ordering, or persist enough remainder state to make chunked settlement equivalent to a single pass.
| [patch.crates-io] | ||
| w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" } | ||
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] Lockfile switches core2 to a git source
This patches core2 from crates.io to a pinned GitHub revision, which bypasses the crates.io checksum and adds an unrelated supply-chain change to a runtime cleanup PR. The comment explains the yank, but this still needs explicit dependency vetting and should be separated unless it is required for this change.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This phase consumes a coarse reads_writes(20, 12) allowance, then conditionally performs balance shortfall crediting, owner transfer, keep-alive balance lookup, recycle, RAORecycledForRegistration insertion, and multiple storage removals. If the estimate is low, on-idle can exceed its advertised weight; if it is high, cleanup can stall despite enough weight for a smaller branch. Meter the actual branch operations before performing each storage or balance operation.
|
🔄 AI review updated — Skeptic: VULNERABLE |
Description
The PR will fix the #2411, and optimize the process of dissolve a network.
Instead of remove all storages related to the netuid once, it will use the on_idle hook to remove these step by step.
and take the weights into consideration.
Related Issue(s)
Type of Change
Breaking Change
If this PR introduces a breaking change, please provide a detailed description of the impact and the migration path for existing applications.
Checklist
./scripts/fix_rust.shto ensure my code is formatted and linted correctlyScreenshots (if applicable)
Please include any relevant screenshots or GIFs that demonstrate the changes made.
Additional Notes
Please provide any additional information or context that may be helpful for reviewers.