Skip to content

Improve dissolve network process#2412

Open
open-junius wants to merge 248 commits into
devnet-readyfrom
benchmark-dissolve-network
Open

Improve dissolve network process#2412
open-junius wants to merge 248 commits into
devnet-readyfrom
benchmark-dissolve-network

Conversation

@open-junius

@open-junius open-junius commented Feb 10, 2026

Copy link
Copy Markdown
Collaborator

Description

The PR will fix the #2411, and optimize the process of dissolve a network.
Instead of remove all storages related to the netuid once, it will use the on_idle hook to remove these step by step.
and take the weights into consideration.

  • Too avoid repeated code, introduce 2 macro to handle the weight for single value and double storage.
  • calculate weight for each iteration, if weight is not enough to continue, clean up will stop there. The checkpoint is recorded as prefix, then cleanup can continue in on_idle next block.
  • Now, the cleanup can handle any size of data. No risk of hanging at dissolve network extrinsic.
  • It includes some indent fix, not relevant to the functionality. but worth to fix them.

Related Issue(s)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Other (please describe):

Breaking Change

If this PR introduces a breaking change, please provide a detailed description of the impact and the migration path for existing applications.

Checklist

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have run ./scripts/fix_rust.sh to ensure my code is formatted and linted correctly
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Screenshots (if applicable)

Please include any relevant screenshots or GIFs that demonstrate the changes made.

Additional Notes

Please provide any additional information or context that may be helpful for reviewers.

@open-junius open-junius self-assigned this Feb 10, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review — see the sticky summary comment for the verdict and the inline comments below for specific findings.

}

if weight_meter.can_consume(write_weight) {
TimelockedIndex::<T>::mutate(|index| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Timelocked commitment index cleanup is unbounded and unmetered

TimelockedIndex is a global vector, and retain() scans and rewrites the whole vector while this path only reserves one write. A large global timelock index can make dissolve cleanup exceed the on_idle budget and stall or overweight a block. This needs the same bounded/resumable cleanup model as the prefix maps, or a storage layout that can clear by netuid under a meter.

let mut weight = Self::remove_data_for_dissolved_networks(limit);

if weight.all_lt(limit) {
weight.saturating_accrue(Self::process_network_registration_queue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Queued network registration runs outside on_idle weight accounting

weight.all_lt(limit) only proves the dissolve cleanup did not consume the full limit; it does not reserve enough remaining weight for process_network_registration_queue(). That call can execute set_new_network_state() and only add the actual weight after the fact, so a block with a tiny amount of remaining weight can still perform a full network registration outside the on_idle budget. Pass a remaining WeightMeter into the registration queue or require a worst-case fit before executing it.


(
read_all,
last_hot.map(|hot| TotalHotkeyAlpha::<T>::hashed_key_for(&hot, netuid)),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Coldkey-prefix checkpointing is not resumable

The checkpoint stores only the outer TotalHotkeyAlpha key. If a single hotkey has more coldkey alpha entries than fit in one on_idle budget, the inner alpha_iter_single_prefix(&hot) scan exits before completion but cannot resume from the last coldkey. The next block restarts the same hotkey from the beginning, so cleanup can make no progress for that subnet. Persist an inner coldkey cursor, or bound one hotkey's inner work so it is guaranteed to fit.

}

let leftover: u128 = pot_u128.saturating_sub(distributed);
let leftover: u128 = total_rem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Remainder payouts are batch-local

Largest-remainder distribution is computed over only the current stakers batch, while total_alpha_value_u128 and pot_u64 are global. When settlement spans multiple on_idle calls, each batch gets its own leftover = total_rem / total_alpha_value_u128, so the extra units are not assigned by global remainder order and can over-credit earlier batches or under-credit later stakers. The remainder pass needs to be global across all stakers, or the algorithm must persist enough state to distribute leftovers once after the full scan.

Comment thread Cargo.toml
[patch.crates-io]
w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" }
# core2 was yanked from crates.io; keep the last published sources available via git.
core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Lockfile switches core2 to a git source

This PR changes core2 from a crates.io checksum-pinned package to a GitHub git source. Even though the rev is pinned, it bypasses the crates.io checksum trust path and is unrelated to the dissolve-network runtime cleanup. Keep the crates.io source if possible, vendor the exact source through the repo's approved dependency process, or split this supply-chain change into a separate explicitly reviewed PR.


// Check if there is enought weight to complete all the operations in this function
// It is the maximum weight that can be consumed by the function. including all potential reads and writes.
let max_weight = T::DbWeight::get().reads_writes(20, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Owner refund and final recycling bypass precise weight metering

This phase charges a coarse reads_writes(20, 12) upfront and then performs conditional mint/spend/transfer/recycle/storage operations whose real cost depends on branch outcomes. In runtime cleanup code, undercharging on any path can make on_idle exceed its advertised budget. Meter each conditional balance/storage operation as it is performed, or reserve a defensible worst-case that covers every branch before mutating state.

@github-actions

Copy link
Copy Markdown
Contributor

🔄 AI review updated — Skeptic: VULNERABLE

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review — see the sticky summary comment for the verdict and the inline comments below for specific findings.

}

if weight_meter.can_consume(write_weight) {
TimelockedIndex::<T>::mutate(|index| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Timelocked commitment index cleanup is unbounded and unmetered

TimelockedIndex is a single StorageValue<BTreeSet<...>>, so this mutate(... retain ...) decodes, scans, rewrites, and re-encodes the entire global index. Charging only writes(1) does not bound the work by the number of timelocked commitments. A dissolved subnet with many timelocked entries can make the on-idle cleanup exceed the block's remaining weight and risk block production/DoS. This needs a bounded/indexed cleanup path or a separate resumable cursor over the timelocked index.

let mut weight = Self::remove_data_for_dissolved_networks(limit);

if weight.all_lt(limit) {
weight.saturating_accrue(Self::process_network_registration_queue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Queued network registration runs outside on_idle weight accounting

process_network_registration_queue() is called whenever cleanup returns any weight below limit, but the remaining weight is not passed in and the queue processor does not refuse work that cannot fit. It can then call set_new_network_state and return a weight larger than the on-idle budget. This reintroduces the same class of overweight runtime work the dissolve cleanup is trying to avoid. Pass a WeightMeter or require a worst-case fit check before processing a queued registration.

}
}

status.subnet_total_alpha_value = Some(total_alpha_value_u128);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Coldkey-prefix checkpointing is not resumable

This persists subnet_total_alpha_value even when the inner alpha_iter_single_prefix(&hot) loop stopped partway through a hotkey, but the checkpoint only records an outer TotalHotkeyAlpha key. On the next block there is no coldkey-level cursor telling the cleanup which (hot, cold) entries were already counted. Depending on iter_from positioning, the partial hotkey's coldkeys are either counted again or skipped, corrupting the total alpha denominator used for payouts. Store a nested coldkey checkpoint or only persist the aggregate after a hotkey is fully processed.

}

let leftover: u128 = pot_u128.saturating_sub(distributed);
let leftover: u128 = total_rem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Remainder payouts are batch-local

Largest-remainder distribution is now computed only over the stakers collected in this on-idle batch, while total_alpha_value_u128 is global for the subnet. Because later batches are not sorted together with earlier batches, the extra RAO from rounding goes to the largest remainders in each batch instead of the largest remainders globally. Batch boundaries are weight-dependent, so stakers can be underpaid or overpaid relative to the deterministic whole-subnet formula. Keep global remainder state or defer remainder assignment until all stakers are known.

Comment thread Cargo.toml
[patch.crates-io]
w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" }
# core2 was yanked from crates.io; keep the last published sources available via git.
core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Lockfile switches core2 to a git source

This patches core2 from crates.io to a GitHub revision for the whole workspace. Even with a pinned rev, it moves a dependency from the registry checksum model to a git source and is unrelated to the dissolve-network runtime change. This is a supply-chain expansion that should either be split into a dedicated dependency PR with review/justification or replaced by a vendored/audited source path.


// Check if there is enought weight to complete all the operations in this function
// It is the maximum weight that can be consumed by the function. including all potential reads and writes.
let max_weight = T::DbWeight::get().reads_writes(20, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Owner refund and final recycling bypass precise weight metering

The final destroy phase consumes a coarse reads_writes(20, 12) allowance up front and then executes conditional balance mint/spend/transfer, recycling, and storage updates without metering each operation. If the actual branches exceed this estimate, on-idle can perform more work than the recorded weight; if the estimate is too high, cleanup may stall unnecessarily. Meter the same way as the earlier phases: check and consume before each conditional balance/storage operation or use benchmarked worst-case weight that covers every branch actually executed here.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

🔄 AI review updated — Skeptic: VULNERABLE

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review — see the sticky summary comment for the verdict and the inline comments below for specific findings.

}

if weight_meter.can_consume(write_weight) {
TimelockedIndex::<T>::mutate(|index| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Timelocked commitment index cleanup is unbounded and unmetered

TimelockedIndex::<T>::mutate(... retain ...) still scans and rewrites the entire global index while charging only one write. A large timelocked index makes dissolve cleanup exceed the on_idle weight budget, so network cleanup can become a block-production DoS path. This needs bounded/chunked cleanup or a storage layout keyed by netuid that can be cleared with the same meter discipline as the other prefix maps.

let mut weight = Self::remove_data_for_dissolved_networks(limit);

if weight.all_lt(limit) {
weight.saturating_accrue(Self::process_network_registration_queue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Queued network registration runs outside on_idle weight accounting

process_network_registration_queue() is called after dissolve cleanup whenever weight.all_lt(limit), but it receives no remaining-weight meter and no worst-case fit check. A block can spend nearly all idle weight on cleanup and then run a full set_new_network_state registration anyway, exceeding the hook's advertised budget. Gate this on the exact remaining weight and pass a meter through the registration path, or defer queue processing to a separately weighted call.

// reserve the weight for the add_balance_to_coldkey_account function call later
if !weight_meter.can_consume(need_to_consume_weight) {
inner_read_all = false;
last_hot = Some(hot.clone());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Coldkey-prefix checkpointing is not resumable

When the inner coldkey scan runs out of weight, the code checkpoints only the outer hotkey (last_hot = Some(hot.clone())) and resumes with TotalHotkeyAlpha::iter_from(hashed_key_for(hot, netuid)). iter_from resumes after that hotkey, so the partially scanned hotkey is skipped on the next pass. That undercounts subnet_total_alpha_value, skips payouts for boundary stakers, and leaves alpha entries behind. Persist an inner coldkey checkpoint or only checkpoint a hotkey after all of its coldkeys are fully processed.

}

let leftover: u128 = pot_u128.saturating_sub(distributed);
let leftover: u128 = total_rem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Remainder payouts are batch-local

Largest-remainder distribution is computed only over the stakers collected in this weight-limited batch. Across multiple on_idle batches, each batch sorts only its local remainders and gives out its local leftover, so the final TAO distribution is not the global pro-rata largest-remainder result. An attacker can benefit or lose based on storage ordering and batch boundaries. Persist all remainders for a final global pass, or use a deterministic accumulator that is independent of batch boundaries.

Comment thread Cargo.toml
[patch.crates-io]
w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" }
# core2 was yanked from crates.io; keep the last published sources available via git.
core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Lockfile switches core2 to a git source

This PR changes core2 from a crates.io checksum-pinned package to a GitHub git source. The rev is pinned, but this is still a supply-chain expansion unrelated to dissolve-network cleanup and removes the registry checksum path for a transitive dependency. Keep the crates.io version if possible, vendor the exact source through an established repository-controlled process, or split this into a separate dependency-remediation PR with justification.


// Check if there is enought weight to complete all the operations in this function
// It is the maximum weight that can be consumed by the function. including all potential reads and writes.
let max_weight = T::DbWeight::get().reads_writes(20, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Owner refund and final recycling bypass precise weight metering

The finalization path consumes a coarse reads_writes(20, 12) budget, then performs conditional mint/spend/transfer/recycle/storage operations whose actual paths depend on balances and refund state. In runtime cleanup code this should be metered at the operation boundary, not via a broad allowance that can drift from real DB costs. Use the same incremental can_consume/consume pattern around each conditional balance/storage operation, or prove and encode a generated benchmarked bound for this exact path.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

🔄 AI review updated — Skeptic: VULNERABLE

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review — see the sticky summary comment for the verdict and the inline comments below for specific findings.

}

if weight_meter.can_consume(write_weight) {
TimelockedIndex::<T>::mutate(|index| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Timelocked commitment index cleanup is unbounded and unmetered

TimelockedIndex::<T>::mutate(... retain ...) scans and rewrites the entire global index while charging only writes(1). A large global timelock index can make dissolved-network cleanup exceed the on_idle budget and stall or overweight the block. This needs bounded, resumable cleanup or storage shaped so cleanup is prefix-clearable by netuid.

let mut weight = Self::remove_data_for_dissolved_networks(limit);

if weight.all_lt(limit) {
weight.saturating_accrue(Self::process_network_registration_queue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Queued network registration runs outside on_idle weight accounting

process_network_registration_queue() is called after dissolve cleanup whenever weight.all_lt(limit), but it receives no remaining-weight meter and no worst-case fit check. The hook can consume the full dissolve budget and then execute a registration path on top, so the returned on_idle weight can exceed the limit. Gate this call on a bounded remaining budget or make the registration path use the same meter before it mutates state.

// reserve the weight for the add_balance_to_coldkey_account function call later
if !weight_meter.can_consume(need_to_consume_weight) {
inner_read_all = false;
last_hot = Some(hot.clone());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Coldkey-prefix checkpointing is not resumable

When the inner coldkey scan runs out of weight, the checkpoint records only last_hot = Some(hot.clone()). On the next pass, iteration resumes after that hotkey and skips any remaining coldkeys under the same hotkey, leaving stake/value entries undistributed or uncleared. The checkpoint must include the inner coldkey position, or the code must avoid advancing the hotkey checkpoint until every coldkey for that hotkey has been processed.

}

let leftover: u128 = pot_u128.saturating_sub(distributed);
let leftover: u128 = total_rem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Remainder payouts are batch-local

The largest-remainder distribution is computed only over the stakers collected in the current weight-limited batch. If settlement spans multiple idle passes, each batch allocates leftovers independently instead of ranking remainders globally across all stakers, so TAO distribution depends on iteration boundaries and weight availability. Persist enough state to perform a global remainder pass, or defer all payouts until the full staker set has been accumulated.

Comment thread Cargo.toml
[patch.crates-io]
w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" }
# core2 was yanked from crates.io; keep the last published sources available via git.
core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Lockfile switches core2 to a git source

This patch redirects core2 from crates.io to a GitHub revision. Even with a pinned rev, this is a supply-chain change unrelated to dissolve-network cleanup and it removes the normal crates.io checksum path. Keep the published crate source in the lockfile, vendor the exact crate source, or split this into a separately reviewed dependency PR with explicit provenance.


// Check if there is enought weight to complete all the operations in this function
// It is the maximum weight that can be consumed by the function. including all potential reads and writes.
let max_weight = T::DbWeight::get().reads_writes(20, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Owner refund and final recycling bypass precise weight metering

This consumes a coarse reads_writes(20, 12) allowance up front and then executes branch-dependent balance transfers, mint/spend issuance adjustments, final recycling, and cleanup writes without per-operation metering. If the estimate is wrong or future branches add work, the dissolve cleanup can run overweight while still reporting the precharged value. Meter the actual operations or use a proven worst-case bound tied to the called helpers' weights.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

🔄 AI review updated — Skeptic: VULNERABLE

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review — see the sticky summary comment for the verdict and the inline comments below for specific findings.

Comment on lines +607 to +608
if weight_meter.can_consume(write_weight) {
TimelockedIndex::<T>::mutate(|index| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Timelocked commitment index cleanup is unbounded and unmetered

TimelockedIndex is a global index, but this cleanup charges only one write before retain scans and rewrites the whole vector. A large timelock index can make dissolve cleanup exceed the idle weight budget even after the prefix clears were metered. Make this index netuid-keyed/prefix-clearable, or prune it incrementally with its own cursor and per-entry weight accounting.

let mut weight = Self::remove_data_for_dissolved_networks(limit);

if weight.all_lt(limit) {
weight.saturating_accrue(Self::process_network_registration_queue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Queued network registration runs outside on_idle weight accounting

process_network_registration_queue() is executed whenever dissolve cleanup returns less than limit, but it receives no remaining-weight meter and performs set_new_network_state before accruing the actual weight. If only a small amount of idle budget remains, a queued registration can still execute and push the block over its idle limit. Gate this on the remaining weight or make registration processing consume from the same WeightMeter before doing state changes.

Comment on lines +791 to +794
// reserve the weight for the add_balance_to_coldkey_account function call later
if !weight_meter.can_consume(need_to_consume_weight) {
inner_read_all = false;
last_hot = Some(hot.clone());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Coldkey-prefix checkpointing is not resumable

When the inner coldkey scan for a hotkey runs out of weight, the function records only the outer hotkey as last_hot. The next pass resumes the TotalHotkeyAlpha iterator at that same hotkey and restarts alpha_iter_single_prefix(&hot) from the beginning, so a hotkey with more coldkeys than fit in one idle slice can repeatedly discard partial progress and never settle. Persist a nested coldkey checkpoint, or preflight the full hotkey cost before consuming any of its inner work.

Comment on lines 838 to +854
@@ -568,31 +842,18 @@ impl<T: Config> Pallet<T> {
distributed = distributed.saturating_add(u128::from(share_u64));

let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default();
total_rem = total_rem.saturating_add(rem);
portions.push(Portion {
_hot: hot.clone(),
cold: cold.clone(),
is_protocol: false,
share: share_u64,
rem,
});
}

if protocol_alpha_value_u128 > 0 {
let prod: u128 = pot_u128.saturating_mul(protocol_alpha_value_u128);
let share_u128: u128 = prod.checked_div(total_alpha_value_u128).unwrap_or_default();
let share_u64: u64 = share_u128.min(u128::from(u64::MAX)) as u64;
distributed = distributed.saturating_add(u128::from(share_u64));
let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default();
portions.push(Portion {
_hot: owner_coldkey.clone(),
cold: owner_coldkey.clone(),
is_protocol: true,
share: share_u64,
rem,
});
}

let leftover: u128 = pot_u128.saturating_sub(distributed);
let leftover: u128 = total_rem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Remainder payouts are batch-local

Settlement can run in multiple idle batches, but each batch computes largest-remainder bonuses only over the local stakers vector while using the global total_alpha_value_u128. That gives each batch its own rounding competition instead of ranking remainders globally across all stakers, so payout distribution depends on batching and iterator order. Persist global remainder candidates or separate floor-share payment from a final global largest-remainder pass.

Comment thread Cargo.toml
Comment on lines +324 to +325
# core2 was yanked from crates.io; keep the last published sources available via git.
core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Lockfile switches core2 to a git source

This patches core2 from crates.io to a GitHub revision for the whole workspace. The rev is pinned, but this still changes the supply-chain trust source and drops the registry checksum for a dependency unrelated to dissolve-network runtime cleanup. Avoid mixing this into the runtime PR, or vendor/verify the replacement source through the repository's normal dependency-review process.

Comment on lines +487 to +489
// Check if there is enought weight to complete all the operations in this function
// It is the maximum weight that can be consumed by the function. including all potential reads and writes.
let max_weight = T::DbWeight::get().reads_writes(20, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Owner refund and final recycling bypass precise weight metering

This phase reserves a coarse reads_writes(20, 12) budget and then performs conditional balance crediting, transfer, recycling, storage removals, and event-relevant state changes without metering each operation. In on_idle, underestimating this path can exceed the remaining block weight. Split these operations into metered steps or charge a benchmarked worst-case WeightInfo that includes the currency and storage paths actually executed.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

🔄 AI review updated — Skeptic: VULNERABLE

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review — see the sticky summary comment for the verdict and the inline comments below for specific findings.

Comment on lines +607 to +608
if weight_meter.can_consume(write_weight) {
TimelockedIndex::<T>::mutate(|index| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Timelocked commitment index cleanup is unbounded and unmetered

TimelockedIndex is a single global Vec, so mutate(... retain ...) decodes, scans, filters, and rewrites the entire vector regardless of how many entries it contains. The cleanup charges only writes(1) before doing that work, so a large timelocked index can make on_idle exceed its declared weight and threaten block production. This needs to be converted to bounded/resumable storage cleanup, or the index must be modeled so per-netuid entries can be removed with metered prefix iteration.

Comment on lines +191 to +195
fn on_idle(_block: BlockNumberFor<T>, limit: Weight) -> Weight {
let mut weight = Self::remove_data_for_dissolved_networks(limit);

if weight.all_lt(limit) {
weight.saturating_accrue(Self::process_network_registration_queue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Queued network registration runs outside on_idle weight accounting

process_network_registration_queue() is invoked after dissolve cleanup whenever the consumed weight is less than the original limit, but it receives no remaining-weight meter and there is no check that the full registration worst case fits in the leftover budget. set_new_network_state performs multiple storage scans and writes before returning its actual weight, so on_idle can do substantially more work than the block's remaining idle budget. Gate this path with an explicit remaining-weight check/meter or defer it to a normal extrinsic/transactional queue with bounded per-block work.

Comment on lines +791 to +794
// reserve the weight for the add_balance_to_coldkey_account function call later
if !weight_meter.can_consume(need_to_consume_weight) {
inner_read_all = false;
last_hot = Some(hot.clone());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Coldkey-prefix checkpointing is not resumable

When this nested coldkey scan runs out of weight, the persisted checkpoint is only the outer TotalHotkeyAlpha hotkey. Any partially processed coldkeys under that hotkey are either replayed or skipped on the next iter_from batch, depending on which branch set last_hot. That makes the async dissolve settlement non-deterministic under realistic weight limits: totals can be double-counted/undercounted, stakers can miss payouts, and cleanup can leave orphaned alpha entries. Persist a checkpoint that includes the inner coldkey position, or make each batch process an entire hotkey only after first proving the full inner scan and payout work fits.

Comment on lines 838 to +854
@@ -568,31 +842,18 @@ impl<T: Config> Pallet<T> {
distributed = distributed.saturating_add(u128::from(share_u64));

let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default();
total_rem = total_rem.saturating_add(rem);
portions.push(Portion {
_hot: hot.clone(),
cold: cold.clone(),
is_protocol: false,
share: share_u64,
rem,
});
}

if protocol_alpha_value_u128 > 0 {
let prod: u128 = pot_u128.saturating_mul(protocol_alpha_value_u128);
let share_u128: u128 = prod.checked_div(total_alpha_value_u128).unwrap_or_default();
let share_u64: u64 = share_u128.min(u128::from(u64::MAX)) as u64;
distributed = distributed.saturating_add(u128::from(share_u64));
let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default();
portions.push(Portion {
_hot: owner_coldkey.clone(),
cold: owner_coldkey.clone(),
is_protocol: true,
share: share_u64,
rem,
});
}

let leftover: u128 = pot_u128.saturating_sub(distributed);
let leftover: u128 = total_rem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Remainder payouts are batch-local

Largest-remainder distribution is computed only over the stakers collected in the current weight-limited batch, while total_alpha_value_u128 is global for the whole subnet. Each batch calculates leftover from its local total_rem and then advances status.subnet_distributed_tao, so the final payout depends on batch boundaries rather than the global ordering of all staker remainders. That can misallocate dissolved subnet TAO when the settlement spans multiple on_idle calls. Persist all candidate remainders until a global pass is possible, or use a deterministic streaming allocation that is mathematically equivalent across batches.

Comment thread Cargo.toml
Comment on lines +324 to +325
# core2 was yanked from crates.io; keep the last published sources available via git.
core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Lockfile switches core2 to a git source

This runtime PR changes core2 from a crates.io package with a checksum to a GitHub patch revision. Even pinned git dependencies remove the registry checksum/release provenance and add a supply-chain dependency unrelated to dissolve-network cleanup. If the yank must be handled here, vendor the exact crate source or use an internal audited fork with repository ownership and review rationale; otherwise keep this dependency change out of the runtime PR.

Comment on lines +487 to +489
// Check if there is enought weight to complete all the operations in this function
// It is the maximum weight that can be consumed by the function. including all potential reads and writes.
let max_weight = T::DbWeight::get().reads_writes(20, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Owner refund and final recycling bypass precise weight metering

This phase reserves a coarse reads_writes(20, 12) allowance, then conditionally performs multiple balance operations, storage removals, account lookups, and recycling writes. Because the work is not charged at each branch, changes to any called helper can silently invalidate the bound and make dissolve cleanup exceed the on_idle budget. Runtime cleanup code should meter each concrete operation, or use a benchmarked worst-case bound that is tied to the exact helper behavior and enforced before entering the phase.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

🔄 AI review updated — Skeptic: VULNERABLE

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review — see the sticky summary comment for the verdict and the inline comments below for specific findings.

}

if weight_meter.can_consume(write_weight) {
TimelockedIndex::<T>::mutate(|index| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Timelocked commitment index cleanup is unbounded and unmetered

TimelockedIndex::<T>::mutate(... retain ...) scans and rewrites the entire global timelock index, but the meter only reserves one write. If the index is large, dissolve cleanup can consume far more than the on_idle budget and produce an overweight block. Make this index cleanup chunked/resumable, or store it by netuid so clear_prefix_with_meter can meter progress like the other commitment maps.

let mut weight = Self::remove_data_for_dissolved_networks(limit);

if weight.all_lt(limit) {
weight.saturating_accrue(Self::process_network_registration_queue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Queued network registration runs outside on_idle weight accounting

process_network_registration_queue() runs whenever dissolve cleanup reports any weight below limit, but it does not receive a remaining-weight meter or check that its worst-case work fits. set_new_network_state can perform substantial runtime work, so this can exceed the block's on_idle limit after cleanup has already consumed most of the budget. Pass the remaining budget into the queue processor and skip/defer registration unless the full bounded cost fits.

// reserve the weight for the add_balance_to_coldkey_account function call later
if !weight_meter.can_consume(need_to_consume_weight) {
inner_read_all = false;
last_hot = Some(hot.clone());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Coldkey-prefix checkpointing is not resumable

When the inner coldkey scan runs out of weight, the checkpoint records only the outer hotkey key. The next call resumes with TotalHotkeyAlpha::<T>::iter_from(last_hot_key), which restarts that hotkey from the first coldkey, repeats already-charged work, and can fail forever for a hotkey with many coldkeys. Persist an inner coldkey checkpoint, or require enough budget to finish all coldkeys for a hotkey before starting it.

}

let leftover: u128 = pot_u128.saturating_sub(distributed);
let leftover: u128 = total_rem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Remainder payouts are batch-local

The largest-remainder calculation is performed only over the current stakers batch, while status.subnet_distributed_tao persists across batches. Splitting settlement across on_idle calls changes who receives the remainder atoms compared with a single global pass, so payout fairness depends on block weight and iteration batching. Track remainders globally across all stakers before awarding leftovers, or defer the remainder pass until every staker has been scanned.

Comment thread Cargo.toml
[patch.crates-io]
w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" }
# core2 was yanked from crates.io; keep the last published sources available via git.
core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Lockfile switches core2 to a git source

This patches core2 from crates.io to a GitHub revision, and Cargo.lock now resolves it from that git source. Even pinned git dependencies are a supply-chain expansion that executes in every build, and this change is unrelated to the dissolve-network runtime logic. Keep the registry crate if possible, vendor the exact source through the repository's normal process, or split this into a dedicated dependency PR with explicit review.


// Check if there is enought weight to complete all the operations in this function
// It is the maximum weight that can be consumed by the function. including all potential reads and writes.
let max_weight = T::DbWeight::get().reads_writes(20, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Owner refund and final recycling bypass precise weight metering

This function reserves a coarse reads_writes(20, 12) budget up front, then executes conditional balance transfers, shortfall crediting, recycling, storage removals, and final status updates regardless of the actual path taken. For runtime cleanup driven by on_idle, the weight meter should charge each storage and balance operation at the point it is performed, or the cleanup can silently exceed the declared budget on paths with refunds and recycling.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

🔄 AI review updated — Skeptic: VULNERABLE

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review — see the sticky summary comment for the verdict and the inline comments below for specific findings.

Comment on lines +607 to +608
if weight_meter.can_consume(write_weight) {
TimelockedIndex::<T>::mutate(|index| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Timelocked commitment index cleanup is unbounded and unmetered

TimelockedIndex::<T>::mutate(... retain ...) scans and rewrites the entire global timelocked index while charging only writes(1). A large index can make this dissolve-cleanup phase exceed the on_idle weight budget and stall or overweight the block. This needs a bounded/resumable cleanup path keyed by netuid, or the index must be restructured so entries can be removed with metered prefix iteration.

Comment on lines +194 to +195
if weight.all_lt(limit) {
weight.saturating_accrue(Self::process_network_registration_queue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Queued network registration runs outside on_idle weight accounting

This starts process_network_registration_queue() whenever the dissolve cleanup consumed less than limit, but it does not pass a remaining-weight meter or check that the registration's worst-case weight fits in the remaining budget. If cleanup leaves only a small remainder, set_new_network_state can still execute and make on_idle return/consume more than the block allows. Gate registration on a real remaining-weight budget or move it to its own metered phase.

Comment on lines +791 to +794
// reserve the weight for the add_balance_to_coldkey_account function call later
if !weight_meter.can_consume(need_to_consume_weight) {
inner_read_all = false;
last_hot = Some(hot.clone());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Coldkey-prefix checkpointing is not resumable

When the inner coldkey scan runs out of weight, the checkpoint records only the outer hotkey. A hotkey with more coldkeys than fit in one idle budget will restart at the same hotkey without any coldkey cursor, so this phase can repeat the same prefix and never progress. Persist a checkpoint for the inner Alpha prefix, or make the phase process a bounded coldkey cursor directly.

Comment on lines 838 to +854
@@ -568,31 +842,18 @@ impl<T: Config> Pallet<T> {
distributed = distributed.saturating_add(u128::from(share_u64));

let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default();
total_rem = total_rem.saturating_add(rem);
portions.push(Portion {
_hot: hot.clone(),
cold: cold.clone(),
is_protocol: false,
share: share_u64,
rem,
});
}

if protocol_alpha_value_u128 > 0 {
let prod: u128 = pot_u128.saturating_mul(protocol_alpha_value_u128);
let share_u128: u128 = prod.checked_div(total_alpha_value_u128).unwrap_or_default();
let share_u64: u64 = share_u128.min(u128::from(u64::MAX)) as u64;
distributed = distributed.saturating_add(u128::from(share_u64));
let rem: u128 = prod.checked_rem(total_alpha_value_u128).unwrap_or_default();
portions.push(Portion {
_hot: owner_coldkey.clone(),
cold: owner_coldkey.clone(),
is_protocol: true,
share: share_u64,
rem,
});
}

let leftover: u128 = pot_u128.saturating_sub(distributed);
let leftover: u128 = total_rem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Remainder payouts are batch-local

The pro-rata payout is computed over only the stakers collected in this idle batch, while total_alpha_value_u128 is global for the subnet. Largest-remainder ordering is therefore local to each batch, and undistributed pot can later be swept into the owner refund/recycle path instead of going to the globally largest remainders. Settlement needs either a global remainder pass or persisted per-staker remainder state before finalizing the pot.

Comment thread Cargo.toml
Comment on lines +324 to +325
# core2 was yanked from crates.io; keep the last published sources available via git.
core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Lockfile switches core2 to a git source

This patches core2 from crates.io to a GitHub revision for all workspace builds. Even with a pinned rev, this is a supply-chain change unrelated to dissolve-network cleanup and moves a transitive dependency off the registry checksum path. Keep the registry source, vendor the exact source with review, or split this into a dedicated dependency PR with explicit vetting.

Comment on lines +487 to +489
// Check if there is enought weight to complete all the operations in this function
// It is the maximum weight that can be consumed by the function. including all potential reads and writes.
let max_weight = T::DbWeight::get().reads_writes(20, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Owner refund and final recycling bypass precise weight metering

This finalization phase consumes a coarse reads_writes(20, 12) allowance before conditional mint/spend/transfer/recycle/storage operations. Those paths include balance mutations and optional storage writes whose actual cost depends on account/state conditions, so the cleanup can under-account block weight. Meter each conditional operation as it is selected, or prove the upfront bound covers the worst case including all token/account side effects.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

🔄 AI review updated — Skeptic: VULNERABLE

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review — see the sticky summary comment for the verdict and the inline comments below for specific findings.

}

if weight_meter.can_consume(write_weight) {
TimelockedIndex::<T>::mutate(|index| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Timelocked commitment index cleanup is unbounded and unmetered

TimelockedIndex::<T>::mutate(... retain ...) decodes, scans, and rewrites the entire global index while charging only one write. Because this runs inside dissolve cleanup under the on-idle meter, a large timelock index can exceed the available block weight and stall or overweight cleanup. Store this index by netuid, or make the global cleanup resumable and charge per decoded/removed entry.

let mut weight = Self::remove_data_for_dissolved_networks(limit);

if weight.all_lt(limit) {
weight.saturating_accrue(Self::process_network_registration_queue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Queued network registration runs outside on_idle weight accounting

This calls process_network_registration_queue() whenever dissolve cleanup consumed less than limit, but it does not pass the remaining budget or check that a queued set_new_network_state fits before executing it. If cleanup leaves only a small remainder, on-idle can still perform a full network registration and only add the actual weight afterward. Gate this helper with a remaining WeightMeter or a conservative worst-case fit check before doing the registration.

// reserve the weight for the add_balance_to_coldkey_account function call later
if !weight_meter.can_consume(need_to_consume_weight) {
inner_read_all = false;
last_hot = Some(hot.clone());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Coldkey-prefix checkpointing is not resumable

When weight runs out inside alpha_iter_single_prefix(&hot), the checkpoint records only the outer hotkey. Cleanup then has to process all coldkeys for that hotkey in one future idle slice. A hotkey with enough coldkeys can therefore keep this phase from making progress, leaving dissolved-network cleanup stuck. Persist the inner coldkey key as part of the checkpoint, or split the storage so each inner scan is independently resumable.

}

let leftover: u128 = pot_u128.saturating_sub(distributed);
let leftover: u128 = total_rem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Remainder payouts are batch-local

leftover and the largest-remainder sort are computed only over the current metered batch of stakers, while total_alpha_value_u128 is global and subnet_distributed_tao is carried across batches. This changes who receives rounding units based on cleanup chunking rather than the global ordering of remainders, and can misallocate or leave TAO to be handled by the final refund/recycle path. Compute the largest-remainder allocation over a stable global ordering, or persist enough remainder state to make chunked settlement equivalent to a single pass.

Comment thread Cargo.toml
[patch.crates-io]
w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" }
# core2 was yanked from crates.io; keep the last published sources available via git.
core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Lockfile switches core2 to a git source

This patches core2 from crates.io to a pinned GitHub revision, which bypasses the crates.io checksum and adds an unrelated supply-chain change to a runtime cleanup PR. The comment explains the yank, but this still needs explicit dependency vetting and should be separated unless it is required for this change.


// Check if there is enought weight to complete all the operations in this function
// It is the maximum weight that can be consumed by the function. including all potential reads and writes.
let max_weight = T::DbWeight::get().reads_writes(20, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Owner refund and final recycling bypass precise weight metering

This phase consumes a coarse reads_writes(20, 12) allowance, then conditionally performs balance shortfall crediting, owner transfer, keep-alive balance lookup, recycle, RAORecycledForRegistration insertion, and multiple storage removals. If the estimate is low, on-idle can exceed its advertised weight; if it is high, cleanup can stall despite enough weight for a smaller branch. Meter the actual branch operations before performing each storage or balance operation.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

🔄 AI review updated — Skeptic: VULNERABLE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-cargo-audit This PR fails cargo audit but needs to be merged anyway

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants