fix: handle and propagate errors in event channel monitors by xdustinface · Pull Request #573 · dashpay/rust-dashcore

xdustinface · 2026-03-22T07:23:04Z

Add on_failure cancellation token to spawn_broadcast_monitor and spawn_progress_monitor to signal the run loop on fatal monitor errors
Treat Lagged broadcast errors as fatal, lost events are unrecoverable at the moment
Distinguish expected vs unexpected channel close by checking the shutdown token
Run loop watches monitor_failure.cancelled() and exits with ChannelFailure error

Summary by CodeRabbit

Bug Fixes
- Improved error handling in event monitoring systems to properly detect and report channel failures instead of silently continuing or hanging indefinitely.

Introduce an `EventHandler` trait in `dash-spv` with default no-op methods (`on_sync_event`, `on_network_event`, `on_progress`, `on_wallet_event`, `on_error`). The `run()` method now accepts an `Arc<H: EventHandler>`, subscribes to internal channels, and dispatches events to the handler via monitoring tasks. Add `TestEventHandler` in `test_utils` that bridges events back to tokio channels for ergonomic `select!`-based integration tests.

Implement the `EventHandler` trait with tracing-based logging for sync events, wallet events, progress updates, and errors in `main.rs`.

Pass all event callbacks as a single `FFIEventCallbacks` struct directly to `run()` instead of requiring separate set_*_callbacks() calls. This removes 5 Mutex fields from `FFIDashSpvClient` and 10 FFI functions.

coderabbitai · 2026-03-22T07:23:11Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f22c279b-dc5c-420b-b66f-4982fb513658

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The changes implement a failure detection mechanism for event and progress monitoring tasks. A dedicated CancellationToken is introduced to propagate monitor failures (channel lag or unexpected closure) from spawned monitor tasks back to the coordinator's main loop, enabling graceful error handling and shutdown.

Changes

Cohort / File(s)	Summary
Monitor Function Updates `dash-spv/src/client/event_handler.rs`	Added `on_failure: CancellationToken` parameter to `spawn_broadcast_monitor` and `spawn_progress_monitor`. Non-shutdown channel termination (Closed, Lagged) now triggers `handler.on_error`, cancels `on_failure` token, and breaks; previously lagged errors were ignored. Test assertions updated to verify error counts and token cancellation.
Coordinator Monitor Integration `dash-spv/src/client/sync_coordinator.rs`	Introduced `monitor_failure` CancellationToken that is cloned into spawned monitor tasks. Main loop's `tokio::select!` now includes a branch to break with `SpvError::ChannelFailure` when `monitor_failure` is cancelled, enabling failure propagation from monitors to coordinator.

Sequence Diagram

sequenceDiagram
    participant Coordinator as Coordinator<br/>(run loop)
    participant Monitor as Monitor Task<br/>(broadcast)
    participant Handler as Event<br/>Handler
    participant Token as on_failure<br/>Token

    Monitor->>Monitor: Detect channel lag/<br/>closure
    Monitor->>Handler: Call on_error()
    Monitor->>Handler: Log error
    Monitor->>Token: Cancel on_failure token
    Monitor->>Monitor: Break from loop

    Coordinator->>Coordinator: tokio::select! loop
    Coordinator->>Token: Poll on_failure
    Token-->>Coordinator: Cancellation detected
    Coordinator->>Coordinator: Return SpvError::<br/>ChannelFailure

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A token hops through channels wide,
Watching monitors side by side,
When laggers stumble, closure calls,
The token sings and heartbeat falls,
Coordination finds its way,
With graceful failure, come what may!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding error handling and propagation in event channel monitors, which aligns with the core modifications to spawn_broadcast_monitor and spawn_progress_monitor.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/fatal-monitor-errors

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

xdustinface · 2026-03-22T07:23:51Z

@CodeRabbit review

coderabbitai · 2026-03-22T07:23:56Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dash-spv/src/client/sync_coordinator.rs (1)
53-85: ⚠️ Potential issue | 🟠 Major

Return the real monitor failure instead of synthesizing "broadcast receiver lagged".

All four monitors cancel the same token, but this branch always constructs ChannelFailure("event monitor", "broadcast receiver lagged"). That misreports progress-channel failures and unexpected broadcast closes, and it drops the more specific failure message already produced by the monitor task. Please propagate the concrete failure back into run() and return that here. As per coding guidelines, "Use proper error types (thiserror) and propagate errors appropriately in Rust code".

Also applies to: 106-111
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv/src/client/sync_coordinator.rs` around lines 53 - 85, The code in
run() currently maps any monitor cancellation to a synthesized
ChannelFailure("event monitor", "broadcast receiver lagged") which hides the
real cause; change the logic that handles monitor_shutdown/monitor_failure so it
returns the concrete failure produced by the monitor tasks (propagate the actual
error from monitor_failure) instead of constructing the static "broadcast
receiver lagged" message. Locate the monitor spawns (spawn_broadcast_monitor and
spawn_progress_monitor) and the branch that creates ChannelFailure("event
monitor", "broadcast receiver lagged") and replace it so run() awaits or
inspects monitor_failure (the propagated error type) and returns that error
(preserving its thiserror type) when cancelling the token; ensure the same fix
is applied for the second occurrence referenced (lines ~106-111).

🧹 Nitpick comments (1)

dash-spv/src/client/event_handler.rs (1)
242-263: Add a teardown test that closes the channel after shutdown.cancel().

The new guards on Line 55 and Line 98 only matter when the channel closes after shutdown has been requested. The added tests cover unexpected drops and explicit shutdown separately, but they still don't exercise that path, so a regression there would go unnoticed. As per coding guidelines, "Write unit tests for new functionality in Rust code" and "Place unit tests alongside code with #[cfg(test)] attribute".

Also applies to: 319-337
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv/src/client/event_handler.rs` around lines 242 - 263, Add a new unit
test placed alongside the existing tests (under #[cfg(test)]) that exercises the
path where the broadcast channel is closed after shutdown has been requested:
create a broadcast::channel::<SyncEvent>, spawn the same spawn_broadcast_monitor
using the RecordingHandler, call shutdown.cancel() first, then drop the sender
to close the channel, await the task, and assert the handler/error and
on_failure behavior matches expectations; reference the existing test
broadcast_monitor_fails_on_unexpected_channel_close and reuse its setup
(RecordingHandler, shutdown CancellationToken, on_failure CancellationToken, and
spawn_broadcast_monitor invocation) but invert the order of shutdown.cancel()
and drop(tx) so the guards on lines 55 and 98 are exercised.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@dash-spv/src/client/sync_coordinator.rs`:
- Around line 53-85: The code in run() currently maps any monitor cancellation
to a synthesized ChannelFailure("event monitor", "broadcast receiver lagged")
which hides the real cause; change the logic that handles
monitor_shutdown/monitor_failure so it returns the concrete failure produced by
the monitor tasks (propagate the actual error from monitor_failure) instead of
constructing the static "broadcast receiver lagged" message. Locate the monitor
spawns (spawn_broadcast_monitor and spawn_progress_monitor) and the branch that
creates ChannelFailure("event monitor", "broadcast receiver lagged") and replace
it so run() awaits or inspects monitor_failure (the propagated error type) and
returns that error (preserving its thiserror type) when cancelling the token;
ensure the same fix is applied for the second occurrence referenced (lines
~106-111).

---

Nitpick comments:
In `@dash-spv/src/client/event_handler.rs`:
- Around line 242-263: Add a new unit test placed alongside the existing tests
(under #[cfg(test)]) that exercises the path where the broadcast channel is
closed after shutdown has been requested: create a
broadcast::channel::<SyncEvent>, spawn the same spawn_broadcast_monitor using
the RecordingHandler, call shutdown.cancel() first, then drop the sender to
close the channel, await the task, and assert the handler/error and on_failure
behavior matches expectations; reference the existing test
broadcast_monitor_fails_on_unexpected_channel_close and reuse its setup
(RecordingHandler, shutdown CancellationToken, on_failure CancellationToken, and
spawn_broadcast_monitor invocation) but invert the order of shutdown.cancel()
and drop(tx) so the guards on lines 55 and 98 are exercised.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4a090ec-4d14-4e67-b83c-1db7a7f43090

📥 Commits

Reviewing files that changed from the base of the PR and between becff66 and 31549f0.

📒 Files selected for processing (2)

dash-spv/src/client/event_handler.rs
dash-spv/src/client/sync_coordinator.rs

codecov · 2026-03-22T07:39:42Z

Codecov Report

❌ Patch coverage is 82.77635% with 67 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.55%. Comparing base (19fb152) to head (1f3f4c0).
⚠️ Report is 3 commits behind head on refactor/move-callbacks-to-crates.

Files with missing lines	Patch %	Lines
dash-spv-ffi/src/bin/ffi_cli.rs	0.00%	38 Missing ⚠️
dash-spv/src/main.rs	0.00%	13 Missing ⚠️
dash-spv/src/client/event_handler.rs	97.04%	7 Missing ⚠️
dash-spv/src/client/sync_coordinator.rs	85.71%	7 Missing ⚠️
dash-spv-ffi/src/client.rs	92.00%	2 Missing ⚠️

Additional details and impacted files

@@                          Coverage Diff                          @@
##           refactor/move-callbacks-to-crates     #573      +/-   ##
=====================================================================
+ Coverage                              66.32%   66.55%   +0.22%     
=====================================================================
  Files                                    311      312       +1     
  Lines                                  64976    65058      +82     
=====================================================================
+ Hits                                   43097    43298     +201     
+ Misses                                 21879    21760     -119

Flag	Coverage Δ
core	`75.13% <ø> (ø)`
ffi	`36.92% <54.02%> (+0.12%)`	⬆️
rpc	`28.28% <ø> (ø)`
spv	`81.35% <91.05%> (+0.28%)`	⬆️
wallet	`66.27% <ø> (ø)`

Files with missing lines	Coverage Δ
dash-spv-ffi/src/callbacks.rs	`80.70% <100.00%> (+17.23%)`	⬆️
dash-spv/src/client/events.rs	`100.00% <100.00%> (ø)`
dash-spv/src/client/mod.rs	`98.64% <ø> (ø)`
dash-spv-ffi/src/client.rs	`58.85% <92.00%> (-0.53%)`	⬇️
dash-spv/src/client/event_handler.rs	`97.04% <97.04%> (ø)`
dash-spv/src/client/sync_coordinator.rs	`80.55% <85.71%> (+7.82%)`	⬆️
dash-spv/src/main.rs	`0.00% <0.00%> (ø)`
dash-spv-ffi/src/bin/ffi_cli.rs	`0.00% <0.00%> (ø)`

... and 5 files with indirect coverage changes

- Add `on_failure` cancellation token to `spawn_broadcast_monitor` and `spawn_progress_monitor` to signal the run loop on fatal monitor errors - Treat Lagged broadcast errors as fatal, lost events are unrecoverable at the moment - Distinguish expected vs unexpected channel close by checking the shutdown token - Run loop watches `monitor_failure.cancelled()` and exits with `ChannelFailure` error

github-actions · 2026-03-23T02:48:57Z

This PR has merge conflicts with the base branch. Please rebase or merge the base branch into your branch to resolve them.

xdustinface added 3 commits March 22, 2026 14:05

add LoggingEventHandler to CLI binary

2eab339

Implement the `EventHandler` trait with tracing-based logging for sync events, wallet events, progress updates, and errors in `main.rs`.

merge FFI callback setup into dash_spv_ffi_client_run

becff66

Pass all event callbacks as a single `FFIEventCallbacks` struct directly to `run()` instead of requiring separate set_*_callbacks() calls. This removes 5 Mutex fields from `FFIDashSpvClient` and 10 FFI functions.

coderabbitai bot reviewed Mar 22, 2026

View reviewed changes

coderabbitai bot approved these changes Mar 22, 2026

View reviewed changes

xdustinface marked this pull request as draft March 22, 2026 07:55

xdustinface force-pushed the fix/fatal-monitor-errors branch from 31549f0 to 1f3f4c0 Compare March 22, 2026 07:56

xdustinface force-pushed the refactor/move-callbacks-to-crates branch from becff66 to 9ec7541 Compare March 23, 2026 02:46

github-actions bot added the merge-conflict The PR conflicts with the target branch. label Mar 23, 2026

xdustinface force-pushed the refactor/move-callbacks-to-crates branch 5 times, most recently from aaddece to 23accc0 Compare March 23, 2026 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle and propagate errors in event channel monitors#573

fix: handle and propagate errors in event channel monitors#573
xdustinface wants to merge 4 commits intorefactor/move-callbacks-to-cratesfrom
fix/fatal-monitor-errors

xdustinface commented Mar 22, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 22, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

xdustinface commented Mar 22, 2026

Uh oh!

coderabbitai bot commented Mar 22, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xdustinface commented Mar 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

xdustinface commented Mar 22, 2026

Uh oh!

coderabbitai bot commented Mar 22, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xdustinface commented Mar 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 22, 2026 •

edited

Loading

codecov bot commented Mar 22, 2026 •

edited

Loading