Reject rec groups that will collide after writing#8144
Merged
Conversation
We allow the IR to contain types that are more refined than the enabled feature set would normally allow. For example, the IR can contain exact references even when custom descriptors are not enabled or typed function references even when GC is not enabled. Using these more refined types in the IR can make the optimizer more powerful. When we write binaries, we generalize these refined types according to what is allowed by the enabled feature set. Generalizing the types in the binary writer makes it possible that two rec groups that have different structures in the IR will end up having the same structure once the binary is written out. This makes types that were meant to be separate identical, possibly changing the observable behavior of casts inadvertently. To prevent that from happening, we must ensure that types that are meant to be separate in the IR will also be separate after binary writing. We already handle this when globally rewriting types (as of #8139), but that is not enough to prevent the problem from ocurring when the original input already has rec groups that would collide after writing. In general we have to allow overly-refined types to appear in the input so we can test optimizations that take advantage of them. Since we generally allow refined types in the input, there's nothing stopping the fuzzer from randomly generating inputs and feature sets that produce colliding rec groups. In such a case even a simple round trip would change program behavior. Avoid this problem by failing to build types when the TypeBuilder contains distinct rec groups that would collide after binary writing. Check for this condition by maintaining a UniqueRecGroups set while types are being built. Add an `insertOrGet` method to UniqueRecGroups to better support the use case where conflicts need to be detected but not necessarily resolved. Add `asWrittenWithFeatures` methods to Type and HeapType, and use them both from the binary writer and from wasm-type-shape to ensure that the shape comparisons actually reflect the behavior of the binary writer.
tlively
commented
Dec 18, 2025
| } | ||
|
|
||
| bool Type::isCastable() { return isRef() && getHeapType().isCastable(); } | ||
| bool Type::isCastable() const { return isRef() && getHeapType().isCastable(); } |
kripken
approved these changes
Dec 18, 2025
| @@ -173,6 +174,10 @@ struct UniqueRecGroups { | |||
| // Otherwise rebuild the group make it unique and return the rebuilt types, | |||
Member
There was a problem hiding this comment.
Suggested change
| // Otherwise rebuild the group make it unique and return the rebuilt types, | |
| // Otherwise rebuild the group to make it unique and return the rebuilt types, |
(pre-existing, but just noticed this now)
| // Returns the feature set required to use this type. | ||
| FeatureSet getFeatures() const; | ||
|
|
||
| inline HeapType asWrittenWithFeatures(FeatureSet feats) const; |
Member
There was a problem hiding this comment.
Perhaps with => given?
Also please add a comment.
| // and typed function references without GC. Allowing these more-refined types | ||
| // in the IR helps the optimizer be more powerful. However, these disallowed | ||
| // refinements will be erased when a module is written out as a binary, which | ||
| // could cause distinct rec groups becoming identical and potentially change |
Member
There was a problem hiding this comment.
Suggested change
| // could cause distinct rec groups becoming identical and potentially change | |
| // could cause distinct rec groups to become identical and potentially change |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We allow the IR to contain types that are more refined than the enabled
feature set would normally allow. For example, the IR can contain exact
references even when custom descriptors are not enabled or typed
function references even when GC is not enabled. Using these more
refined types in the IR can make the optimizer more powerful. When we
write binaries, we generalize these refined types according to what is
allowed by the enabled feature set.
Generalizing the types in the binary writer makes it possible that two
rec groups that have different structures in the IR will end up having
the same structure once the binary is written out. This makes types that
were meant to be separate identical, possibly changing the observable
behavior of casts inadvertently.
To prevent that from happening, we must ensure that types that are meant
to be separate in the IR will also be separate after binary writing. We
already handle this when globally rewriting types (as of #8139), but
that is not enough to prevent the problem from ocurring when the
original input already has rec groups that would collide after writing.
In general we have to allow overly-refined types to appear in the input
so we can test optimizations that take advantage of them.
Since we generally allow refined types in the input, there's nothing
stopping the fuzzer from randomly generating inputs and feature sets
that produce colliding rec groups. In such a case even a simple round
trip would change program behavior.
Avoid this problem by failing to build types when the TypeBuilder
contains distinct rec groups that would collide after binary writing.
Check for this condition by maintaining a UniqueRecGroups set while
types are being built.
Add an
insertOrGetmethod to UniqueRecGroups to better support the usecase where conflicts need to be detected but not necessarily resolved.
Add
asWrittenWithFeaturesmethods to Type and HeapType, and use themboth from the binary writer and from wasm-type-shape to ensure that the
shape comparisons actually reflect the behavior of the binary writer.