Fix silent event loss in DomainParticipant status channel#401
Merged
jhelovuo merged 1 commit intoAtostek:masterfrom Mar 8, 2026
Merged
Fix silent event loss in DomainParticipant status channel#401jhelovuo merged 1 commit intoAtostek:masterfrom
jhelovuo merged 1 commit intoAtostek:masterfrom
Conversation
The DomainParticipantStatusEvent channel had a capacity of 16 and silently dropped events when full, causing downstream consumers to miss endpoint discoveries entirely with no indication of data loss. A single participant can expose many endpoints (e.g. ~16 in a typical ROS 2 node), so even two participants overwhelm a 16-slot channel during the initial SEDP burst. Changes: - Increase status channel capacity from 16 to 2048 - Upgrade log level from trace! to warn! on channel overflow
008c582 to
5b70e40
Compare
Contributor
Author
|
@jhelovuo would you mind reviewing this PR? It's a fix for proper interoperability with ROS 2 DDS. |
Member
|
Nice debugging work. Thank you for the contribution. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
DomainParticipantStatusEventchannel silently drops events when its capacity is exceeded. Thetry_sendimplementation converts a Full error intoOk(()), so neither the sender nor the consumer can detect the loss. The only indication is a trace!-level log that is invisible at normal log levels.This PR:
How was this discovered
I discovered this bug while building a ROS 2 application that uses RustDDS. When using the status_listener() API to introspect the DDS graph, some nodes' services and topics would not show up at all. Which nodes were visible varied between runs. After ruling out consumer-side issues and confirming SEDP was delivering all data correctly, we traced the problem to the bounded status channel silently dropping events.
Why 16 is not enough
A single DDS participant can expose many endpoints. When SEDP discovers a remote participant, it generates
WriterDetectedandReaderDetectedevents for every endpoint in a burst — faster than the consumer can drain.For example, a typical ROS 2 node creates ~9 writers and ~7 readers (~16 status events per node). Two nodes produce ~32 events, already exceeding the channel capacity. At 10 nodes (~160 events), 90% of events are silently lost. This is not specific to ROS 2 — any DDS application with multiple endpoints per participant will hit this with modest scale.
Why 2048