fix: tighten oneshot memory ordering#111
Conversation
There was a problem hiding this comment.
Pull request overview
Tightens the oneshot channel’s state-transition memory orderings to better synchronize message/waker visibility and final channel deallocation ownership, and adds regression tests covering concurrent send/drop vs try_recv/poll.
Changes:
- Adjusted atomic orderings and added dedicated acquire fences around MESSAGE-handling paths to explicitly synchronize message visibility and final deallocation handoff.
- Updated drop paths (Sender/Receiver/Recv) to better coordinate which side frees the channel under races.
- Added concurrent send/drop completion tests for both
try_recvandFuture::pollpaths.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
mea/src/oneshot/mod.rs |
Updates atomic orderings/fences and drop-path state transitions to strengthen synchronization and ownership handoff. |
mea/src/oneshot/tests.rs |
Adds multi-threaded regression tests for concurrent send/drop vs try_recv/poll, plus a small thread helper. |
Comments suppressed due to low confidence (3)
mea/src/oneshot/tests.rs:363
- This test’s receiver thread loops with
spin_loop()untilDisconnectedis observed; without any timeout/iteration cap, failures can turn into an infinite hang duringjoin(). Consider adding a timeout/budget and failing fast if the expected state isn’t reached.
let receiver_thread = spawn_named("receiver", move || {
loop {
match receiver.try_recv() {
Ok(value) => panic!("unexpected value: {value}"),
Err(TryRecvError::Empty) => spin_loop(),
Err(TryRecvError::Disconnected) => break,
}
}
mea/src/oneshot/tests.rs:389
- This test polls in a tight unbounded
spin_loop()until completion. If the wake/state transition breaks, the test will hang indefinitely atjoin(). Add a bounded timeout/iteration budget (and panic on timeout) to keep CI reliable.
loop {
match Pin::new(&mut receiver).poll(&mut context) {
Poll::Ready(Ok(999)) => break,
Poll::Ready(result) => panic!("unexpected result: {result:?}"),
Poll::Pending => spin_loop(),
}
}
mea/src/oneshot/tests.rs:415
- Similar to the other concurrent tests, this thread spins in an unbounded loop until it observes the disconnect. A regression could cause an infinite hang. Add a timeout/iteration cap and fail fast rather than spinning forever.
loop {
match Pin::new(&mut receiver).poll(&mut context) {
Poll::Ready(Err(oneshot::RecvError::Disconnected)) => break,
Poll::Ready(result) => panic!("unexpected result: {result:?}"),
Poll::Pending => spin_loop(),
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Memory ordering here is tough to think through and guarantee correctness, so my review may not be very comprehensive. First and foremost, following Copilot's review, Then, I found an issue that existed before this PR. I am concerned that the As Copilot noted, the new spin-loop tests should also be bounded with a timeout or iteration budget. |
Use acquire ordering before dropping stored wakers in Recv::poll and Recv::drop. When Recv::drop observes AWAKING, wait for the sender to finish the transition instead of writing DISCONNECTED first. Also bound the spin-loop tests and add coverage for pending receiver drop and cross-thread waker replacement.
|
I pushed a follow-up commit to address it. |
Signed-off-by: tison <wander4096@gmail.com>
|
I'm reviewing the comments and order. From agent review, it says: Blockers:
Non-blocking: the new 5s spin/concurrency tests are useful as smoke/stress tests, but they should not be presented as proof of the memory-ordering argument. A loom/model test would be a good follow-up if this code is staying subtle. |
The agent review makes sense. I focused on the state-machine change and missed that some existing comments became stale or misleading. I added a small follow-up to fix the MESSAGE / RECEIVING -> EMPTY comments and spell out the Recv::drop case a bit more. For the temporary EMPTY window, after Recv::drop successfully CAS RECEIVING -> EMPTY with Acquire, it has retracted the published waker and owns dropping that waker. The Acquire CAS synchronizes with write_waker's Release CAS, so the waker is initialized before drop_waker().If a sender's send transition wins during this temporary EMPTY window, it observes EMPTY, publishes MESSAGE, and returns Ok. Because it did not observe RECEIVING, it never takes or drops the waker, and because send succeeded from EMPTY, it does not own deallocating the channel. Recv::drop still owns the retracted waker, drops it, and loops. Once the sender has published MESSAGE, Recv::drop will observe MESSAGE with Acquire, drop the message, and deallocate the channel.If no sender wins that temporary EMPTY window, Recv::drop's next loop iteration can complete EMPTY -> DISCONNECTED. In that case, any later cleanup is owned by the sender side: either Sender::send observes DISCONNECTED and the SendError path owns the message/allocation cleanup, or Sender::drop observes DISCONNECTED and deallocates the channel. |
Summary
Verification