feat: fast sync#111
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a three-phase "Fast Sync" strategy to optimize wallet synchronization. The implementation includes a fast UTXO scan (Phase 1), a recent full scan (Phase 2), and a history backfill (Phase 3). Key technical changes include the addition of a SpentUnconfirmed output status, idempotent database insertion logic for backfill operations, and new integration tests for benchmarking performance and verifying balance correctness. Review feedback identifies a critical issue where confirming spent outputs during verification fails to record corresponding debits, potentially leading to incorrect balances. Additionally, suggestions were made to chunk large UTXO status requests to prevent payload size issues and to refactor duplicated scanning loop logic to improve maintainability.
brianp
left a comment
There was a problem hiding this comment.
Looks mostly good. Some room for TOCTOU issues is open and you may want to review them.
| }; | ||
|
|
||
| if utxo_info.spent_in_header.is_some() { | ||
| crate::db::update_output_status(conn, output_id, OutputStatus::Spent).map_err(ScanError::DbError)?; |
There was a problem hiding this comment.
update_output_status here doesn't guard against the output having already been resolved by something else. If another process updated the status between the query and this write, you'd overwrite it. Adding WHERE status = 'SPENT_UNCONFIRMED' to the update would prevent that.
| let mut marked_unspent = 0u64; | ||
| let mut confirmed_spent = 0u64; | ||
|
|
||
| for utxo_info in &response.utxos { |
There was a problem hiding this comment.
If get_utxos_deleted_info returns a partial response (some hashes missing from response.utxos), those outputs stay SpentUnconfirmed forever with no retry path. Might want to log/track which ones weren't returned so they can be re-verified later.
eede19d to
d9674b5
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a three-phase fast synchronization strategy for the Minotari wallet, consisting of a fast UTXO scan, a recent full scan, and a history backfill phase. The changes introduce a SpentUnconfirmed output status and idempotent database operations to support efficient syncing and history reconstruction. Review feedback suggests optimizing database connection management during backfill, ensuring idempotency for balance change records, and improving the accuracy of historical timestamps and dates when exact block data is unavailable.
| async fn verify_spent_unconfirmed_outputs( | ||
| &self, | ||
| conn: &rusqlite::Connection, | ||
| target: &AccountSyncTarget, | ||
| unresolved: &[crate::db::UnresolvedSpentOutput], | ||
| ) -> Result<(), ScanError> { | ||
| use std::collections::HashMap; | ||
|
|
||
| let hash_to_output: HashMap<Vec<u8>, &crate::db::UnresolvedSpentOutput> = | ||
| unresolved.iter().map(|o| (o.output_hash.clone(), o)).collect(); | ||
|
|
||
| let all_output_hashes: Vec<Vec<u8>> = unresolved.iter().map(|o| o.output_hash.clone()).collect(); | ||
|
|
||
| let mut marked_unspent = 0u64; | ||
| let mut confirmed_spent = 0u64; | ||
|
|
||
| for output_hashes in all_output_hashes.chunks(50) { | ||
| let response = self | ||
| .client | ||
| .get_utxos_deleted_info(output_hashes.to_vec()) | ||
| .await | ||
| .map_err(|e| ScanError::Intermittent(format!("Failed to query UTXO deleted info: {}", e)))?; | ||
|
|
||
| for utxo_info in &response.utxos { | ||
| let Some(output) = hash_to_output.get(&utxo_info.utxo_hash) else { | ||
| continue; | ||
| }; | ||
|
|
||
| if let Some((spent_height, spent_block_hash)) = &utxo_info.spent_in_header { | ||
| // Create input record so the spend is tracked in the DB | ||
| let input_id = crate::db::insert_input( | ||
| conn, | ||
| target.account.id, | ||
| output.output_id, | ||
| *spent_height, | ||
| spent_block_hash, | ||
| 0, // timestamp not available from deleted info | ||
| ) | ||
| .map_err(ScanError::DbError)?; | ||
|
|
||
| // Create debit balance change so credits - debits nets correctly | ||
| let change = crate::models::BalanceChange { | ||
| account_id: target.account.id, | ||
| caused_by_output_id: None, | ||
| caused_by_input_id: Some(input_id), | ||
| description: "Output spent (verified by base node)".to_string(), | ||
| balance_credit: 0.into(), | ||
| balance_debit: output.value.into(), | ||
| effective_date: chrono::Utc::now().naive_utc(), | ||
| effective_height: *spent_height, | ||
| claimed_recipient_address: None, | ||
| claimed_sender_address: None, | ||
| memo_hex: None, | ||
| memo_parsed: None, | ||
| claimed_fee: None, | ||
| claimed_amount: None, | ||
| is_reversal: false, | ||
| reversal_of_balance_change_id: None, | ||
| is_reversed: false, | ||
| }; | ||
| crate::db::insert_balance_change(conn, &change).map_err(ScanError::DbError)?; | ||
|
|
||
| crate::db::update_output_status(conn, output.output_id, OutputStatus::Spent) | ||
| .map_err(ScanError::DbError)?; | ||
| confirmed_spent += 1; | ||
| } else if utxo_info.found_in_header.is_some() { | ||
| crate::db::update_output_status(conn, output.output_id, OutputStatus::Unspent) | ||
| .map_err(ScanError::DbError)?; | ||
| marked_unspent += 1; | ||
| } else { | ||
| debug!( | ||
| account = &*target.account.friendly_name, | ||
| output_hash = &*utxo_info.utxo_hash.to_hex(); | ||
| "Ouput unkown by base node, status kept as SpentUnconfirmed" | ||
| ); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| info!( | ||
| account_id = target.account.id, | ||
| marked_unspent = marked_unspent, | ||
| confirmed_spent = confirmed_spent; | ||
| "Verified SpentUnconfirmed outputs against base node" | ||
| ); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
The verify_spent_unconfirmed_outputs function holds a database connection across multiple asynchronous network calls (self.client.get_utxos_deleted_info(...).await). This is inefficient as it keeps a connection from the pool busy while waiting for network I/O. While performance optimizations can be deferred for functions not on a critical path (like wallet rebuilds), this trade-off should be noted as it could lead to pool exhaustion under load.
References
- Performance optimizations can be deferred for functions that are not on a critical path, like a wallet rebuild function, provided the trade-off is noted.
| reversal_of_balance_change_id: None, | ||
| is_reversed: false, | ||
| }; | ||
| crate::db::insert_balance_change(conn, &change).map_err(ScanError::DbError)?; |
There was a problem hiding this comment.
To ensure idempotency and prevent duplicate balance entries if the verification process is interrupted and retried, use insert_balance_change_if_not_exists instead of insert_balance_change.
| crate::db::insert_balance_change(conn, &change).map_err(ScanError::DbError)?; | |
| crate::db::insert_balance_change_if_not_exists(conn, &change).map_err(ScanError::DbError)?; |
| description: "Output spent (verified by base node)".to_string(), | ||
| balance_credit: 0.into(), | ||
| balance_debit: output.value.into(), | ||
| effective_date: chrono::Utc::now().naive_utc(), |
There was a problem hiding this comment.
Using chrono::Utc::now().naive_utc() for the effective_date of a historical balance change will cause the transaction to appear as recent in the wallet's history, regardless of when it actually occurred on-chain. While the exact timestamp is not provided by the current API, this behavior should be noted as it affects the accuracy of the transaction history view.
References
- Performance optimizations or simplifications in non-critical paths should have their trade-offs noted.
Description
implements fast sync for wallet to sync to a faster useable state without tx history