Skip to content

fix: handle and propagate errors in event channel monitors#573

Draft
xdustinface wants to merge 4 commits intorefactor/move-callbacks-to-cratesfrom
fix/fatal-monitor-errors
Draft

fix: handle and propagate errors in event channel monitors#573
xdustinface wants to merge 4 commits intorefactor/move-callbacks-to-cratesfrom
fix/fatal-monitor-errors

Conversation

@xdustinface
Copy link
Collaborator

@xdustinface xdustinface commented Mar 22, 2026

  • Add on_failure cancellation token to spawn_broadcast_monitor and spawn_progress_monitor to signal the run loop on fatal monitor errors
  • Treat Lagged broadcast errors as fatal, lost events are unrecoverable at the moment
  • Distinguish expected vs unexpected channel close by checking the shutdown token
  • Run loop watches monitor_failure.cancelled() and exits with ChannelFailure error

Summary by CodeRabbit

  • Bug Fixes
    • Improved error handling in event monitoring systems to properly detect and report channel failures instead of silently continuing or hanging indefinitely.

Introduce an `EventHandler` trait in `dash-spv` with default no-op
methods (`on_sync_event`, `on_network_event`, `on_progress`,
`on_wallet_event`, `on_error`). The `run()` method now accepts an
`Arc<H: EventHandler>`, subscribes to internal channels, and dispatches
events to the handler via monitoring tasks.

Add `TestEventHandler` in `test_utils` that bridges events back to
tokio channels for ergonomic `select!`-based integration tests.
Implement the `EventHandler` trait with tracing-based logging for sync
events, wallet events, progress updates, and errors in `main.rs`.
Pass all event callbacks as a single `FFIEventCallbacks` struct directly
to `run()` instead of requiring separate set_*_callbacks() calls. This
removes 5 Mutex fields from `FFIDashSpvClient` and 10 FFI functions.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f22c279b-dc5c-420b-b66f-4982fb513658

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The changes implement a failure detection mechanism for event and progress monitoring tasks. A dedicated CancellationToken is introduced to propagate monitor failures (channel lag or unexpected closure) from spawned monitor tasks back to the coordinator's main loop, enabling graceful error handling and shutdown.

Changes

Cohort / File(s) Summary
Monitor Function Updates
dash-spv/src/client/event_handler.rs
Added on_failure: CancellationToken parameter to spawn_broadcast_monitor and spawn_progress_monitor. Non-shutdown channel termination (Closed, Lagged) now triggers handler.on_error, cancels on_failure token, and breaks; previously lagged errors were ignored. Test assertions updated to verify error counts and token cancellation.
Coordinator Monitor Integration
dash-spv/src/client/sync_coordinator.rs
Introduced monitor_failure CancellationToken that is cloned into spawned monitor tasks. Main loop's tokio::select! now includes a branch to break with SpvError::ChannelFailure when monitor_failure is cancelled, enabling failure propagation from monitors to coordinator.

Sequence Diagram

sequenceDiagram
    participant Coordinator as Coordinator<br/>(run loop)
    participant Monitor as Monitor Task<br/>(broadcast)
    participant Handler as Event<br/>Handler
    participant Token as on_failure<br/>Token

    Monitor->>Monitor: Detect channel lag/<br/>closure
    Monitor->>Handler: Call on_error()
    Monitor->>Handler: Log error
    Monitor->>Token: Cancel on_failure token
    Monitor->>Monitor: Break from loop

    Coordinator->>Coordinator: tokio::select! loop
    Coordinator->>Token: Poll on_failure
    Token-->>Coordinator: Cancellation detected
    Coordinator->>Coordinator: Return SpvError::<br/>ChannelFailure
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A token hops through channels wide,
Watching monitors side by side,
When laggers stumble, closure calls,
The token sings and heartbeat falls,
Coordination finds its way,
With graceful failure, come what may!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding error handling and propagation in event channel monitors, which aligns with the core modifications to spawn_broadcast_monitor and spawn_progress_monitor.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/fatal-monitor-errors

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@xdustinface
Copy link
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dash-spv/src/client/sync_coordinator.rs (1)

53-85: ⚠️ Potential issue | 🟠 Major

Return the real monitor failure instead of synthesizing "broadcast receiver lagged".

All four monitors cancel the same token, but this branch always constructs ChannelFailure("event monitor", "broadcast receiver lagged"). That misreports progress-channel failures and unexpected broadcast closes, and it drops the more specific failure message already produced by the monitor task. Please propagate the concrete failure back into run() and return that here. As per coding guidelines, "Use proper error types (thiserror) and propagate errors appropriately in Rust code".

Also applies to: 106-111

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv/src/client/sync_coordinator.rs` around lines 53 - 85, The code in
run() currently maps any monitor cancellation to a synthesized
ChannelFailure("event monitor", "broadcast receiver lagged") which hides the
real cause; change the logic that handles monitor_shutdown/monitor_failure so it
returns the concrete failure produced by the monitor tasks (propagate the actual
error from monitor_failure) instead of constructing the static "broadcast
receiver lagged" message. Locate the monitor spawns (spawn_broadcast_monitor and
spawn_progress_monitor) and the branch that creates ChannelFailure("event
monitor", "broadcast receiver lagged") and replace it so run() awaits or
inspects monitor_failure (the propagated error type) and returns that error
(preserving its thiserror type) when cancelling the token; ensure the same fix
is applied for the second occurrence referenced (lines ~106-111).
🧹 Nitpick comments (1)
dash-spv/src/client/event_handler.rs (1)

242-263: Add a teardown test that closes the channel after shutdown.cancel().

The new guards on Line 55 and Line 98 only matter when the channel closes after shutdown has been requested. The added tests cover unexpected drops and explicit shutdown separately, but they still don't exercise that path, so a regression there would go unnoticed. As per coding guidelines, "Write unit tests for new functionality in Rust code" and "Place unit tests alongside code with #[cfg(test)] attribute".

Also applies to: 319-337

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv/src/client/event_handler.rs` around lines 242 - 263, Add a new unit
test placed alongside the existing tests (under #[cfg(test)]) that exercises the
path where the broadcast channel is closed after shutdown has been requested:
create a broadcast::channel::<SyncEvent>, spawn the same spawn_broadcast_monitor
using the RecordingHandler, call shutdown.cancel() first, then drop the sender
to close the channel, await the task, and assert the handler/error and
on_failure behavior matches expectations; reference the existing test
broadcast_monitor_fails_on_unexpected_channel_close and reuse its setup
(RecordingHandler, shutdown CancellationToken, on_failure CancellationToken, and
spawn_broadcast_monitor invocation) but invert the order of shutdown.cancel()
and drop(tx) so the guards on lines 55 and 98 are exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@dash-spv/src/client/sync_coordinator.rs`:
- Around line 53-85: The code in run() currently maps any monitor cancellation
to a synthesized ChannelFailure("event monitor", "broadcast receiver lagged")
which hides the real cause; change the logic that handles
monitor_shutdown/monitor_failure so it returns the concrete failure produced by
the monitor tasks (propagate the actual error from monitor_failure) instead of
constructing the static "broadcast receiver lagged" message. Locate the monitor
spawns (spawn_broadcast_monitor and spawn_progress_monitor) and the branch that
creates ChannelFailure("event monitor", "broadcast receiver lagged") and replace
it so run() awaits or inspects monitor_failure (the propagated error type) and
returns that error (preserving its thiserror type) when cancelling the token;
ensure the same fix is applied for the second occurrence referenced (lines
~106-111).

---

Nitpick comments:
In `@dash-spv/src/client/event_handler.rs`:
- Around line 242-263: Add a new unit test placed alongside the existing tests
(under #[cfg(test)]) that exercises the path where the broadcast channel is
closed after shutdown has been requested: create a
broadcast::channel::<SyncEvent>, spawn the same spawn_broadcast_monitor using
the RecordingHandler, call shutdown.cancel() first, then drop the sender to
close the channel, await the task, and assert the handler/error and on_failure
behavior matches expectations; reference the existing test
broadcast_monitor_fails_on_unexpected_channel_close and reuse its setup
(RecordingHandler, shutdown CancellationToken, on_failure CancellationToken, and
spawn_broadcast_monitor invocation) but invert the order of shutdown.cancel()
and drop(tx) so the guards on lines 55 and 98 are exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4a090ec-4d14-4e67-b83c-1db7a7f43090

📥 Commits

Reviewing files that changed from the base of the PR and between becff66 and 31549f0.

📒 Files selected for processing (2)
  • dash-spv/src/client/event_handler.rs
  • dash-spv/src/client/sync_coordinator.rs

@codecov
Copy link

codecov bot commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 82.77635% with 67 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.55%. Comparing base (19fb152) to head (1f3f4c0).
⚠️ Report is 3 commits behind head on refactor/move-callbacks-to-crates.

Files with missing lines Patch % Lines
dash-spv-ffi/src/bin/ffi_cli.rs 0.00% 38 Missing ⚠️
dash-spv/src/main.rs 0.00% 13 Missing ⚠️
dash-spv/src/client/event_handler.rs 97.04% 7 Missing ⚠️
dash-spv/src/client/sync_coordinator.rs 85.71% 7 Missing ⚠️
dash-spv-ffi/src/client.rs 92.00% 2 Missing ⚠️
Additional details and impacted files
@@                          Coverage Diff                          @@
##           refactor/move-callbacks-to-crates     #573      +/-   ##
=====================================================================
+ Coverage                              66.32%   66.55%   +0.22%     
=====================================================================
  Files                                    311      312       +1     
  Lines                                  64976    65058      +82     
=====================================================================
+ Hits                                   43097    43298     +201     
+ Misses                                 21879    21760     -119     
Flag Coverage Δ
core 75.13% <ø> (ø)
ffi 36.92% <54.02%> (+0.12%) ⬆️
rpc 28.28% <ø> (ø)
spv 81.35% <91.05%> (+0.28%) ⬆️
wallet 66.27% <ø> (ø)
Files with missing lines Coverage Δ
dash-spv-ffi/src/callbacks.rs 80.70% <100.00%> (+17.23%) ⬆️
dash-spv/src/client/events.rs 100.00% <100.00%> (ø)
dash-spv/src/client/mod.rs 98.64% <ø> (ø)
dash-spv-ffi/src/client.rs 58.85% <92.00%> (-0.53%) ⬇️
dash-spv/src/client/event_handler.rs 97.04% <97.04%> (ø)
dash-spv/src/client/sync_coordinator.rs 80.55% <85.71%> (+7.82%) ⬆️
dash-spv/src/main.rs 0.00% <0.00%> (ø)
dash-spv-ffi/src/bin/ffi_cli.rs 0.00% <0.00%> (ø)

... and 5 files with indirect coverage changes

@xdustinface xdustinface marked this pull request as draft March 22, 2026 07:55
- Add `on_failure` cancellation token to `spawn_broadcast_monitor` and `spawn_progress_monitor` to signal the run loop on fatal monitor errors
- Treat Lagged broadcast errors as fatal, lost events are unrecoverable at the moment
- Distinguish expected vs unexpected channel close by checking the shutdown token
- Run loop watches `monitor_failure.cancelled()` and exits with `ChannelFailure` error
@xdustinface xdustinface force-pushed the fix/fatal-monitor-errors branch from 31549f0 to 1f3f4c0 Compare March 22, 2026 07:56
@xdustinface xdustinface force-pushed the refactor/move-callbacks-to-crates branch from becff66 to 9ec7541 Compare March 23, 2026 02:46
@github-actions github-actions bot added the merge-conflict The PR conflicts with the target branch. label Mar 23, 2026
@github-actions
Copy link

This PR has merge conflicts with the base branch. Please rebase or merge the base branch into your branch to resolve them.

@xdustinface xdustinface force-pushed the refactor/move-callbacks-to-crates branch 5 times, most recently from aaddece to 23accc0 Compare March 23, 2026 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflict The PR conflicts with the target branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant