Fix ContinuationStore desynchronization#8741
Conversation
When a primary module execution traps or suspends to the host, its continuation store is cleared. Previously, this was done by reassigning the shared_ptr to a new ContinuationStore instance. However, secondary (linked) modules that were instantiated prior to this still hold the original shared_ptr to the old ContinuationStore. This led to desynchronization, where the secondary module would run with stale continuation state (including leaked continuations and resuming flags), eventually causing crashes like assertion failures in visitSuspend. This fix changes clearContinuationStore to clear the ContinuationStore in-place (clearing the continuations vector and resetting resuming flag) instead of reassigning the shared_ptr, ensuring all linked modules continue to share the same cleared state. Added a lit test to verify the fix and prevent regression.
| #endif | ||
| continuationStore = std::make_shared<ContinuationStore>(); | ||
| continuationStore->continuations.clear(); | ||
| continuationStore->resuming = false; |
There was a problem hiding this comment.
Please put this in a clear() method on the class, and add a comment here why we call clear to clear it in-place as opposed to wiping it from orbit.
There was a problem hiding this comment.
(putting in the class makes it less likely we forget to add something to clear() if we add something to the class)
There was a problem hiding this comment.
and add a comment here why we call clear to clear it in-place as opposed to wiping it from orbit.
Refactored and added a comment on the new clear() method, but I think explaining why we don't just reset the shared pointer is too in-the-weeds to be helpful. Given the previous comments explaining why the continuation store needs to be shared, I don't think readers of this code would be thinking about clearing the pointer.
kripken
left a comment
There was a problem hiding this comment.
Fair enough, yeah, in the class, the context is clearer.
When a primary module execution traps or suspends to the host, its
continuation store is cleared. Previously, this was done by reassigning
the shared_ptr to a new ContinuationStore instance. However, secondary
(linked) modules that were instantiated prior to this still hold the
original shared_ptr to the old ContinuationStore. This led to
desynchronization, where the secondary module would run with stale
continuation state (including leaked continuations and resuming flags),
eventually causing crashes like assertion failures in visitSuspend.
This fix changes clearContinuationStore to clear the ContinuationStore
in-place (clearing the continuations vector and resetting resuming flag)
instead of reassigning the shared_ptr, ensuring all linked modules
continue to share the same cleared state.
Added a lit test to verify the fix and prevent regression.