Skip to content

fix: R-059 deadlock in with_wallets_from_seeds#1471

Open
m2ux wants to merge 8 commits intomainfrom
fix/114-r059-deadlock-with-wallets-from-seeds
Open

fix: R-059 deadlock in with_wallets_from_seeds#1471
m2ux wants to merge 8 commits intomainfrom
fix/114-r059-deadlock-with-wallets-from-seeds

Conversation

@m2ux
Copy link
Copy Markdown
Contributor

@m2ux m2ux commented May 7, 2026

Summary

Fix the guaranteed deadlock in with_wallets_from_seeds (R-059) by restructuring mutex acquisition so the same non-reentrant std::sync::Mutex is no longer locked twice while a guard is still alive. The function now acquires self.wallets exactly once and produces two disjoint &mut Wallet<D> references from one guard via HashMap::get_disjoint_mut.

🎫 Issue 📐 Engineering


Motivation

with_wallets_from_seeds in ledger/helpers/src/versions/common/context.rs called self.wallets.lock() and took a &mut Wallet reference from the resulting guard, then attempted to lock the same non-reentrant std::sync::Mutex a second time. The first guard could not drop because origin_wallet borrowed from it, so the second .lock() deadlocked unconditionally on every invocation. The function is called from transient.rs:49, making any code path that triggers it a guaranteed hang.

Eliminating the deadlock is required before any feature work that depends on with_wallets_from_seeds can run reliably, and before tests covering this code path can be exercised end-to-end.


Changes

ledger/helpers/src/versions/common/context.rs — body rewrite of with_wallets_from_seeds:

  • Acquire self.wallets.lock() exactly once. The two disjoint &mut Wallet<D> references are obtained from a single guard via HashMap::get_disjoint_mut([&origin_seed, &destination_seed]) (stable since Rust 1.86; workspace pin is 1.95).
  • Add an up-front assert!(origin_seed != destination_seed, ...) so aliased-seed misuse panics with a specific message ("origin_seed and destination_seed must differ") before the lock is taken, rather than producing the opaque None that get_disjoint_mut returns for aliased keys.
  • Map missing-seed None to a panic whose message style matches the existing wallet_for_seed panic ("Wallet with seed ...").
  • Add a rustdoc comment above the function naming the two preconditions (single-lock invariant; distinct seeds required).
  • Public signature unchangedFnOnce(&mut Wallet<D>, &mut Wallet<D>) -> R is preserved; the single live caller (transient.rs:49) compiles unchanged.

Regression tests (added to the existing #[cfg(test)] mod tests block):

  • with_wallets_from_seeds_does_not_deadlock — runs the fixed call on a worker thread and joins via mpsc::recv_timeout(Duration::from_secs(5)). The 5s wall-clock deadline is several orders of magnitude above the expected return time but well below any plausible CI hard timeout, so it cleanly discriminates "fixed" from "still hung". Also asserts the closure side-effect (an AtomicU64 counter) was observed.
  • with_wallets_from_seeds_panics_on_aliased_seed#[should_panic(expected = "origin_seed and destination_seed must differ")] pins the new aliased-seed panic message.
  • with_wallets_from_seeds_panics_on_missing_seed#[should_panic(expected = "Wallet with seed")] pins the missing-seed panic message.

changes/node/changed/PM-21800-r059-deadlock-with-wallets-from-seeds.md — new change file with #ledger #helpers tags, summary of bug + fix shape, and PR / JIRA links.


📌 Submission Checklist

  • Changes are backward-compatible (or flagged if breaking)
  • Pull request description explains why the change is needed
  • Self-reviewed the diff
  • I have included a change file, or skipped for this reason: [reason]
  • If the changes introduce a new feature, I have bumped the node minor version
  • Update documentation (if relevant)
  • No new todos introduced

🔱 Fork Strategy

  • Node Runtime Update
  • Node Client Update
  • Other
  • N/A

🗹 TODO before merging

  • Ready for review

Initial empty commit to open draft PR for tracking.
Closes shieldedtech/shielded-security-engineering#114

Signed-off-by: Mike Clay <mike.clay@shielded.io>
@m2ux m2ux self-assigned this May 7, 2026
m2ux and others added 4 commits May 7, 2026 16:18
…-059)

`LedgerContext::with_wallets_from_seeds` previously acquired the wallets
mutex twice in sequence, deadlocking on the second `lock()` because
`std::sync::Mutex` is not reentrant. The function now acquires the lock
exactly once and uses `HashMap::get_disjoint_mut` to obtain two disjoint
`&mut Wallet<D>` references from a single `MutexGuard`, mirroring the
locking shape of the sibling `with_wallet_from_seed`.

The same-seed-twice case is rejected up front with a clear `assert!`
panic (cannot produce two disjoint `&mut` to the same wallet); a missing
seed panics with a message matching the existing `wallet_for_seed` style.

Adds three regression tests in `mod tests` of `context.rs`:
- `with_wallets_from_seeds_does_not_deadlock` — runs the call on a
  worker thread and joins via `recv_timeout(5s)` on an `mpsc` channel
  to discriminate "fixed" from "still hung".
- `with_wallets_from_seeds_panics_on_aliased_seed` — `#[should_panic]`
  on `"origin_seed and destination_seed must differ"`.
- `with_wallets_from_seeds_panics_on_missing_seed` — `#[should_panic]`
  on `"Wallet with seed"`.

Includes a `changes/node/changed/PM-21800-...` change file referencing
PR #1471 and JIRA PM-21800.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mike Clay <mike.clay@shielded.io>
…m_seeds panic

Addresses M-1 from post-impl-review: the missing-seed panic in
with_wallets_from_seeds previously listed both origin and destination
seeds, conflating two failure modes. Now matches the existing
wallet_for_seed style and names only the absent seed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mike Clay <mike.clay@shielded.io>
…tests

Resolves clippy::cloned_ref_to_slice_refs at the regression-test
call sites for with_wallets_from_seeds. The closure does not need
ownership of the seed values, so a one-element slice via
std::slice::from_ref avoids the clone and satisfies the lint
under -D warnings at workspace scope.

Signed-off-by: Mike Clay <mike.clay@shielded.io>
Add an Issue: line pointing at shieldedtech/shielded-security-engineering#114
so the change fragment links back to the upstream tracker. Sibling
fragments use full URLs for PR and JIRA links; the cross-repo issue is
included in the same form (a Closes: #114 line would refer to the wrong
repository).

Signed-off-by: Mike Clay <mike.clay@shielded.io>
@m2ux m2ux marked this pull request as ready for review May 8, 2026 09:28
@m2ux m2ux requested a review from a team as a code owner May 8, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant