codegen: fix VU0 dest-mask lane-bit ordering and VMR32 shuffle pattern typos#112
Closed
jlsandri wants to merge 1 commit intoran-j:mainfrom
Closed
codegen: fix VU0 dest-mask lane-bit ordering and VMR32 shuffle pattern typos#112jlsandri wants to merge 1 commit intoran-j:mainfrom
jlsandri wants to merge 1 commit intoran-j:mainfrom
Conversation
Copy-paste errors in the VU0 codegen path across ~36 instruction sites:
1. dest_mask lane-bit ordering was inverted: many emitters passed
{0x8, 0x4, 0x2, 0x1} to _mm_set_epi32() instead of the correct
{0x1, 0x2, 0x4, 0x8}, and the VSQD emitter had the further typo
{0x1, 0x2, 0x2, 0x1}. Because _mm_set_epi32() takes arguments in
reverse lane order, the X/Y mask bits were gating the Z/W lanes
and vice versa, causing masked writes to silently misroute. The
affected instructions include VSQD, VADD/VSUB/VMUL/VMADD/VMSUB
family _Field variants, VMINI/VMAX _Field, VFTOI/VITOF family,
VADD_bc/VSUB_bc/VMUL_bc/VMADD_bc/VMSUB_bc, VADDq/VSUBq/VMULq/
VMADDq/VMSUBq, VADDi/VSUBi/VMULi/VMADDi/VMSUBi, and a number of
ADDAq/VMULA variants.
2. VU0 VMR32 was emitting _MM_SHUFFLE(0,0,0,1) instead of
_MM_SHUFFLE(0,3,2,1), so the rotated lane produced a broadcast
of lane 1 across all output lanes instead of a true 1-lane rotate.
Both classes of bug are pure copy-paste / typo fixes with no runtime
behavioural change for dest_mask == 0xF (the most common case), but
silently wrong for any masked dest_mask.
ef9434b to
3722597
Compare
Author
|
Closing as part of a batch cleanup after #107 landed. The runtime ecosystem refactor in #107 substantially reworked the files this PR touched, and I would like to re-audit the underlying fix against the new code structure before putting it back in front of you. If the fix is still needed after that re-audit, I will re-open as a focused PR rebased onto current main. Thanks for your patience. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two copy-paste typos in the VU0 macro codegen, both producing silently-wrong output for any instruction that uses a non-default dest-mask or the VMR32 rotate.
1.
dest_masklane-bit ordering inverted across ~36 VU0 emittersMany emitters were passing the dest-mask bits to
_mm_set_epi32()asand the
VSQDemitter additionally had{0x1, 0x2, 0x2, 0x1}as a clearly-typo'd variant of the same pattern.
The correct ordering — matching the rest of the VU0 dest-mask code and the PS2 convention where bit 0 is the X lane — is
Because
_mm_set_epi32()takes its arguments in reverse lane order (arg 0 → lane 3, arg 3 → lane 0), the inverted bit order caused the X/Y mask bits to gate the Z/W lanes and vice versa. Any masked write with a non-uniform dest_mask silently misrouted.Affected emitters include:
VSQD,VADD_Field/VSUB_Field/VMUL_Field/VMADD_Field/VMSUB_Field,VMINI_Field/VMAX_Field,VFTOI*/VITOF*, the_bcvariants, the_qvariants, the_ivariants, and assortedVMULA/ADDAqsites. ~36 call sites in total.2.
VU0_S2_VMR32shuffle patternVMR32is a 1-lane rotate, not a broadcast of lane 1. The old pattern produced{lane1, lane1, lane1, lane1}instead of the correct{lane1, lane2, lane3, lane0}.Scope
ps2xRecomp/src/lib/code_generator.cpp_mm_set_epi32call)dest_mask == 0xF(the most common case); every other mask was silently wrong before.Rationale
Both classes of bug are pure copy-paste errors. The fix restores the stated invariant (bit 0 ↔ X lane) across every emitter, and corrects the VMR32 shuffle to perform the rotate the mnemonic actually describes.