Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 35 additions & 8 deletions crates/sentrix-core/src/storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -365,8 +365,11 @@ impl Storage {
/// premine / genesis accounts that pre-date the first trie touch.
/// We must not zero those out.
///
/// Trie-lookup errors (`Err(_)`) propagate: a corrupted trie is a
/// hard-fail, not silent fallback.
/// Trie-lookup errors (`Err(_)`) are logged and skipped: a missing
/// or corrupted trie node for one address no longer aborts boot.
/// The address is treated as having no trie leaf (blob value
/// preserved); the next block touching that account rewrites the
/// trie entry and closes the gap.
fn reconcile_accounts_from_trie(bc: &mut Blockchain) -> SentrixResult<(usize, usize)> {
// Build the candidate address set first — sorted + deduped so
// the reconcile order is deterministic across runs (helps debug
Expand Down Expand Up @@ -409,18 +412,42 @@ impl Storage {

// Phase 1: read all trie leaves into a buffer. This avoids
// holding the trie borrow while we mutate accounts in phase 2.
//
// 2026-05-20: a missing trie node here used to crash boot. Testnet
// hit this with one address having a dangling reference to node
// 314e57bd... at h=5003961; the other 99.99% of the trie was
// healthy and the chain had been producing for 5 hours. Refusing
// to boot turned one stale leaf into an unrecoverable validator.
// Fail-soft now: log the gap, skip the address (existing phase-2
// logic treats a `None` leaf as "trie has no opinion, keep the
// blob"), and let the next block apply rewrite the entry.
let mut trie_values: Vec<(String, Option<(u64, u64)>)> = Vec::with_capacity(addrs.len());
let mut trie_gaps: usize = 0;
for addr in &addrs {
let key = address_to_key(addr);
let leaf = trie.get(&key).map_err(|e| {
SentrixError::Internal(format!(
"B3 reconcile: trie lookup for {addr} failed at h={}: {e}",
bc.chain.last().map(|b| b.index).unwrap_or(0)
))
})?;
let leaf = match trie.get(&key) {
Ok(leaf) => leaf,
Err(e) => {
tracing::warn!(
"B3 reconcile: trie lookup for {addr} failed at h={}: {e} — \
skipping reconcile for this address (will rewrite on next touch)",
bc.chain.last().map(|b| b.index).unwrap_or(0)
);
trie_gaps += 1;
None
}
};
let decoded = leaf.and_then(|bytes| account_value_decode(&bytes));
trie_values.push((addr.clone(), decoded));
}
if trie_gaps > 0 {
tracing::warn!(
"B3 reconcile: skipped {trie_gaps}/{} addresses due to missing trie nodes \
at h={}; chain will continue producing — touched accounts repair themselves",
addrs.len(),
bc.chain.last().map(|b| b.index).unwrap_or(0)
);
Comment on lines +425 to +449
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

TRIE_FILES=$(fd -i 'trie|error' crates --extension rs)

echo "=== trie.get definitions ==="
rg -n -C4 '\bfn\s+get\s*\(' $TRIE_FILES

echo
echo "=== trie-related error enums / aliases / variants ==="
rg -n -C4 '\b(enum|type)\s+\w*Error\b|Missing|Dangling|Corrupt|Decode|Storage|Io' $TRIE_FILES

echo
echo "=== existing trie.get call sites ==="
rg -n -C4 'trie\.get\s*\(' crates/sentrix-core/src/storage.rs $TRIE_FILES

Repository: sentrix-labs/sentrix

Length of output: 93


🏁 Script executed:

# Check repository structure and find trie-related files
find . -type f -name "*.rs" | grep -i trie | head -20

Repository: sentrix-labs/sentrix

Length of output: 454


🏁 Script executed:

# Also check Cargo.toml files to understand dependencies
find . -name "Cargo.toml" | xargs grep -l "trie\|sentrix" | head -10

Repository: sentrix-labs/sentrix

Length of output: 374


🏁 Script executed:

# Check the storage.rs file context around the error handling
sed -n '420,455p' crates/sentrix-core/src/storage.rs

Repository: sentrix-labs/sentrix

Length of output: 1744


🏁 Script executed:

# Look for trie-related modules in the codebase
rg "mod trie|use.*trie" crates --type rs | head -20

Repository: sentrix-labs/sentrix

Length of output: 91


🏁 Script executed:

# Correct the previous command
rg "mod trie|use.*trie" crates/sentrix-core/src/storage.rs -A2 -B2

Repository: sentrix-labs/sentrix

Length of output: 280


🏁 Script executed:

# Check the trie module's error type definition
cat crates/sentrix-trie/src/lib.rs

Repository: sentrix-labs/sentrix

Length of output: 777


🏁 Script executed:

# Look for get function and error types in tree.rs
rg -n "fn get|enum.*Error|type.*Error" crates/sentrix-trie/src/tree.rs -A3

Repository: sentrix-labs/sentrix

Length of output: 228


🏁 Script executed:

# Check tree.rs for the get implementation
head -200 crates/sentrix-trie/src/tree.rs | tail -150

Repository: sentrix-labs/sentrix

Length of output: 7105


🏁 Script executed:

# Get the full get() implementation
sed -n '200,280p' crates/sentrix-trie/src/tree.rs

Repository: sentrix-labs/sentrix

Length of output: 3312


🏁 Script executed:

# Find SentrixResult and SentrixError definitions
rg "type SentrixResult|enum SentrixError" crates/sentrix-trie/src -A20

Repository: sentrix-labs/sentrix

Length of output: 46


🏁 Script executed:

# Look for where SentrixError is defined in the entire repo
rg "enum SentrixError" crates -A30

Repository: sentrix-labs/sentrix

Length of output: 2299


🏁 Script executed:

# Check sentrix-primitives or common error handling
fd -i error crates -name "*.rs" -type f | head -10

Repository: sentrix-labs/sentrix

Length of output: 295


🏁 Script executed:

# Get the full SentrixError enum
cat crates/sentrix-primitives/src/error.rs | head -100

Repository: sentrix-labs/sentrix

Length of output: 2063


🏁 Script executed:

# Also check if there are more variants
rg "SentrixError::" crates/sentrix-primitives/src/error.rs

Repository: sentrix-labs/sentrix

Length of output: 170


🏁 Script executed:

# Search for all uses of SentrixError in the trie module to understand all variants
rg "SentrixError::" crates/sentrix-trie/src -B1 -A1

Repository: sentrix-labs/sentrix

Length of output: 12687


🏁 Script executed:

# Verify what cache.get_node returns and how it handles errors
sed -n '1,150p' crates/sentrix-trie/src/cache.rs | grep -A30 "fn get_node"

Repository: sentrix-labs/sentrix

Length of output: 1434


🏁 Script executed:

# Check if there are patterns elsewhere in storage.rs for selective error handling
rg "match.*Err|SentrixError::" crates/sentrix-core/src/storage.rs -B2 -A3 | head -80

Repository: sentrix-labs/sentrix

Length of output: 3173


🏁 Script executed:

# Check if there's a specific error message pattern for "missing node" vs other Internal errors
rg "trie: missing node" crates/sentrix-trie/src -B2 -A2

Repository: sentrix-labs/sentrix

Length of output: 2842


🏁 Script executed:

# Look for any pattern matching on error messages elsewhere in the codebase
rg "SentrixError::Internal.*missing|if.*contains.*missing" crates/sentrix-core/src -B2 -A2

Repository: sentrix-labs/sentrix

Length of output: 46


🏁 Script executed:

# Check the full context of the reconcile function to understand the intent
sed -n '360,380p' crates/sentrix-core/src/storage.rs

Repository: sentrix-labs/sentrix

Length of output: 1169


🏁 Script executed:

# Check if cache.get_node can return StorageError by looking at the full chain
sed -n '1,50p' crates/sentrix-trie/src/cache.rs

Repository: sentrix-labs/sentrix

Length of output: 2244


🏁 Script executed:

# Verify load_node implementation which is called by cache.get_node
rg "fn load_node" crates/sentrix-trie/src/storage.rs -A20

Repository: sentrix-labs/sentrix

Length of output: 895


🏁 Script executed:

# Check what other errors could be returned from trie operations
rg "map_err.*SentrixError" crates/sentrix-trie/src/storage.rs | head -20

Repository: sentrix-labs/sentrix

Length of output: 1510


Distinguish missing-node errors from storage/serialization failures in error handling.

Lines 428-439 catch all trie.get() errors identically. However, trie.get() can fail with three distinct error types: SentrixError::Internal("trie: missing node ...") (node not found), SentrixError::StorageError(...) (I/O failure), and SentrixError::SerializationError(...) (deserialization failure). The current code treats them all as recoverable gaps, but storage and serialization errors indicate a fundamentally broken trie and should not be suppressed.

Match specifically on SentrixError::Internal with a message pattern check for "missing node", and let StorageError and SerializationError propagate. This preserves the fail-soft behavior for dangling trie nodes while preventing boot with a corrupted or unreadable trie.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sentrix-core/src/storage.rs` around lines 425 - 449, The current
catch-all for trie.get(&key) treats all errors as recoverable gaps; change the
Err branch to match on the SentrixError enum: if
Err(SentrixError::Internal(msg)) and msg.contains("missing node") then log the
missing-node warning, increment trie_gaps and continue (as you do now), but if
Err(SentrixError::StorageError(_)) or Err(SentrixError::SerializationError(_))
then propagate the error (return Err(e) or use ?), since those indicate broken
storage/serialization; keep the same handling for successful Ok(leaf) and
subsequent account_value_decode usage and push into trie_values.

}

// Phase 2: apply repairs.
let height = bc.chain.last().map(|b| b.index).unwrap_or(0);
Expand Down
Loading