Skip to content

Fix ContinuationStore desynchronization#8741

Merged
tlively merged 3 commits into
mainfrom
interpreter-suspend-fix
May 21, 2026
Merged

Fix ContinuationStore desynchronization#8741
tlively merged 3 commits into
mainfrom
interpreter-suspend-fix

Conversation

@tlively
Copy link
Copy Markdown
Member

@tlively tlively commented May 21, 2026

When a primary module execution traps or suspends to the host, its
continuation store is cleared. Previously, this was done by reassigning
the shared_ptr to a new ContinuationStore instance. However, secondary
(linked) modules that were instantiated prior to this still hold the
original shared_ptr to the old ContinuationStore. This led to
desynchronization, where the secondary module would run with stale
continuation state (including leaked continuations and resuming flags),
eventually causing crashes like assertion failures in visitSuspend.

This fix changes clearContinuationStore to clear the ContinuationStore
in-place (clearing the continuations vector and resetting resuming flag)
instead of reassigning the shared_ptr, ensuring all linked modules
continue to share the same cleared state.

Added a lit test to verify the fix and prevent regression.

When a primary module execution traps or suspends to the host, its
continuation store is cleared. Previously, this was done by reassigning
the shared_ptr to a new ContinuationStore instance. However, secondary
(linked) modules that were instantiated prior to this still hold the
original shared_ptr to the old ContinuationStore. This led to
desynchronization, where the secondary module would run with stale
continuation state (including leaked continuations and resuming flags),
eventually causing crashes like assertion failures in visitSuspend.

This fix changes clearContinuationStore to clear the ContinuationStore
in-place (clearing the continuations vector and resetting resuming flag)
instead of reassigning the shared_ptr, ensuring all linked modules
continue to share the same cleared state.

Added a lit test to verify the fix and prevent regression.
@tlively tlively requested a review from kripken May 21, 2026 01:05
@tlively tlively requested a review from a team as a code owner May 21, 2026 01:05
Copy link
Copy Markdown
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with that

Comment thread src/wasm-interpreter.h Outdated
#endif
continuationStore = std::make_shared<ContinuationStore>();
continuationStore->continuations.clear();
continuationStore->resuming = false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put this in a clear() method on the class, and add a comment here why we call clear to clear it in-place as opposed to wiping it from orbit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(putting in the class makes it less likely we forget to add something to clear() if we add something to the class)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and add a comment here why we call clear to clear it in-place as opposed to wiping it from orbit.

Refactored and added a comment on the new clear() method, but I think explaining why we don't just reset the shared pointer is too in-the-weeds to be helpful. Given the previous comments explaining why the continuation store needs to be shared, I don't think readers of this code would be thinking about clearing the pointer.

Copy link
Copy Markdown
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, yeah, in the class, the context is clearer.

@tlively tlively merged commit d215f03 into main May 21, 2026
16 checks passed
@tlively tlively deleted the interpreter-suspend-fix branch May 21, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants