Skip to content

codegen: fix VU0 dest-mask lane-bit ordering and VMR32 shuffle pattern typos#112

Closed
jlsandri wants to merge 1 commit intoran-j:mainfrom
jlsandri:pr/codegen-vu0-dest-mask-lane-bits
Closed

codegen: fix VU0 dest-mask lane-bit ordering and VMR32 shuffle pattern typos#112
jlsandri wants to merge 1 commit intoran-j:mainfrom
jlsandri:pr/codegen-vu0-dest-mask-lane-bits

Conversation

@jlsandri
Copy link
Copy Markdown

@jlsandri jlsandri commented Apr 6, 2026

Problem

Two copy-paste typos in the VU0 macro codegen, both producing silently-wrong output for any instruction that uses a non-default dest-mask or the VMR32 rotate.

1. dest_mask lane-bit ordering inverted across ~36 VU0 emitters

Many emitters were passing the dest-mask bits to _mm_set_epi32() as

(dest_mask & 0x8) ? -1 : 0, (dest_mask & 0x4) ? -1 : 0,
(dest_mask & 0x2) ? -1 : 0, (dest_mask & 0x1) ? -1 : 0

and the VSQD emitter additionally had

{0x1, 0x2, 0x2, 0x1}

as a clearly-typo'd variant of the same pattern.

The correct ordering — matching the rest of the VU0 dest-mask code and the PS2 convention where bit 0 is the X lane — is

(dest_mask & 0x1) ? -1 : 0, (dest_mask & 0x2) ? -1 : 0,
(dest_mask & 0x4) ? -1 : 0, (dest_mask & 0x8) ? -1 : 0

Because _mm_set_epi32() takes its arguments in reverse lane order (arg 0 → lane 3, arg 3 → lane 0), the inverted bit order caused the X/Y mask bits to gate the Z/W lanes and vice versa. Any masked write with a non-uniform dest_mask silently misrouted.

Affected emitters include: VSQD, VADD_Field / VSUB_Field / VMUL_Field / VMADD_Field / VMSUB_Field, VMINI_Field / VMAX_Field, VFTOI* / VITOF*, the _bc variants, the _q variants, the _i variants, and assorted VMULA / ADDAq sites. ~36 call sites in total.

2. VU0_S2_VMR32 shuffle pattern

// before
_MM_SHUFFLE(0, 0, 0, 1)
// after
_MM_SHUFFLE(0, 3, 2, 1)

VMR32 is a 1-lane rotate, not a broadcast of lane 1. The old pattern produced {lane1, lane1, lane1, lane1} instead of the correct {lane1, lane2, lane3, lane0}.

Scope

  • One file: ps2xRecomp/src/lib/code_generator.cpp
  • +68 / -68 (almost all sites are pure re-orderings of an _mm_set_epi32 call)
  • Zero behavioural change for dest_mask == 0xF (the most common case); every other mask was silently wrong before.

Rationale

Both classes of bug are pure copy-paste errors. The fix restores the stated invariant (bit 0 ↔ X lane) across every emitter, and corrects the VMR32 shuffle to perform the rotate the mnemonic actually describes.

Copy-paste errors in the VU0 codegen path across ~36 instruction sites:

1. dest_mask lane-bit ordering was inverted: many emitters passed
   {0x8, 0x4, 0x2, 0x1} to _mm_set_epi32() instead of the correct
   {0x1, 0x2, 0x4, 0x8}, and the VSQD emitter had the further typo
   {0x1, 0x2, 0x2, 0x1}. Because _mm_set_epi32() takes arguments in
   reverse lane order, the X/Y mask bits were gating the Z/W lanes
   and vice versa, causing masked writes to silently misroute. The
   affected instructions include VSQD, VADD/VSUB/VMUL/VMADD/VMSUB
   family _Field variants, VMINI/VMAX _Field, VFTOI/VITOF family,
   VADD_bc/VSUB_bc/VMUL_bc/VMADD_bc/VMSUB_bc, VADDq/VSUBq/VMULq/
   VMADDq/VMSUBq, VADDi/VSUBi/VMULi/VMADDi/VMSUBi, and a number of
   ADDAq/VMULA variants.

2. VU0 VMR32 was emitting _MM_SHUFFLE(0,0,0,1) instead of
   _MM_SHUFFLE(0,3,2,1), so the rotated lane produced a broadcast
   of lane 1 across all output lanes instead of a true 1-lane rotate.

Both classes of bug are pure copy-paste / typo fixes with no runtime
behavioural change for dest_mask == 0xF (the most common case), but
silently wrong for any masked dest_mask.
@jlsandri jlsandri force-pushed the pr/codegen-vu0-dest-mask-lane-bits branch from ef9434b to 3722597 Compare April 6, 2026 08:56
@jlsandri
Copy link
Copy Markdown
Author

jlsandri commented Apr 6, 2026

Closing as part of a batch cleanup after #107 landed. The runtime ecosystem refactor in #107 substantially reworked the files this PR touched, and I would like to re-audit the underlying fix against the new code structure before putting it back in front of you. If the fix is still needed after that re-audit, I will re-open as a focused PR rebased onto current main. Thanks for your patience.

@jlsandri jlsandri closed this Apr 6, 2026
@jlsandri jlsandri deleted the pr/codegen-vu0-dest-mask-lane-bits branch April 6, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant