Skip to content

codegen: add dest-mask support for VU0 VMOVE and VMR32#116

Closed
jlsandri wants to merge 2 commits intoran-j:mainfrom
jlsandri:pr/codegen-vu0-vmove-vmr32-dest-mask
Closed

codegen: add dest-mask support for VU0 VMOVE and VMR32#116
jlsandri wants to merge 2 commits intoran-j:mainfrom
jlsandri:pr/codegen-vu0-vmove-vmr32-dest-mask

Conversation

@jlsandri
Copy link
Copy Markdown

@jlsandri jlsandri commented Apr 6, 2026

Stacked on #112 (codegen: fix VU0 dest-mask lane-bit ordering and VMR32 shuffle pattern typos). Please merge #112 first — once it lands, this PR's diff shrinks to just the single commit below.

Problem

VU0_S2_VMOVE and VU0_S2_VMR32 were emitting unconditional moves/shuffles that ignored the instruction's dest_mask field. Any masked VMOVE/VMR32 therefore overwrote lanes the mask was meant to preserve, silently corrupting state for every masked write.

Fix

Both emitters now:

  1. Fast-path dest_mask == 0xF to the existing unconditional codegen (identical output for the common case — zero overhead).
  2. For any other mask, emit an _mm_blendv_ps guarded by a per-lane selector built from the dest_mask bits, using the same {0x1, 0x2, 0x4, 0x8} lane-bit convention as the rest of the VU0 dest-mask emitters.

Why this is stacked on #112

VMR32's masked path emits a shuffle followed by a blend; the shuffle uses _MM_SHUFFLE(0,3,2,1), which is the corrected form from #112. Without that prerequisite merged, the masked VMR32 would rotate to the wrong lane — the same typo #112 fixes in the unmasked path.

Scope

  • One file: ps2xRecomp/src/lib/code_generator.cpp
  • +25 / -2

Rationale

Parity with the rest of the VU0 macro codegen. Every other vector instruction already respects dest_mask; VMOVE/VMR32 were outliers.

jlsandri added 2 commits April 6, 2026 18:55
Copy-paste errors in the VU0 codegen path across ~36 instruction sites:

1. dest_mask lane-bit ordering was inverted: many emitters passed
   {0x8, 0x4, 0x2, 0x1} to _mm_set_epi32() instead of the correct
   {0x1, 0x2, 0x4, 0x8}, and the VSQD emitter had the further typo
   {0x1, 0x2, 0x2, 0x1}. Because _mm_set_epi32() takes arguments in
   reverse lane order, the X/Y mask bits were gating the Z/W lanes
   and vice versa, causing masked writes to silently misroute. The
   affected instructions include VSQD, VADD/VSUB/VMUL/VMADD/VMSUB
   family _Field variants, VMINI/VMAX _Field, VFTOI/VITOF family,
   VADD_bc/VSUB_bc/VMUL_bc/VMADD_bc/VMSUB_bc, VADDq/VSUBq/VMULq/
   VMADDq/VMSUBq, VADDi/VSUBi/VMULi/VMADDi/VMSUBi, and a number of
   ADDAq/VMULA variants.

2. VU0 VMR32 was emitting _MM_SHUFFLE(0,0,0,1) instead of
   _MM_SHUFFLE(0,3,2,1), so the rotated lane produced a broadcast
   of lane 1 across all output lanes instead of a true 1-lane rotate.

Both classes of bug are pure copy-paste / typo fixes with no runtime
behavioural change for dest_mask == 0xF (the most common case), but
silently wrong for any masked dest_mask.
Previously VMOVE and VMR32 emitted unconditional moves/shuffles that
ignored the instruction's dest_mask field, so any masked VMOVE/VMR32
would incorrectly overwrite lanes the mask was meant to preserve.

This patch:
- Fast-paths dest_mask == 0xF to the existing unconditional codegen.
- For any other mask, emits an _mm_blendv_ps guarded by a per-lane
  selector built from the dest_mask bits, matching the lane-selection
  convention used by the rest of the VU0 dest-mask emitters.

Stacks on the VMR32 shuffle pattern fix — without that prerequisite,
the masked VMR32 path would rotate to the wrong lane.
@jlsandri jlsandri force-pushed the pr/codegen-vu0-vmove-vmr32-dest-mask branch from c589c40 to 0b88fb9 Compare April 6, 2026 08:56
@jlsandri
Copy link
Copy Markdown
Author

jlsandri commented Apr 6, 2026

Closing as part of a batch cleanup after #107 landed. The runtime ecosystem refactor in #107 substantially reworked the files this PR touched, and I would like to re-audit the underlying fix against the new code structure before putting it back in front of you. If the fix is still needed after that re-audit, I will re-open as a focused PR rebased onto current main. Thanks for your patience.

@jlsandri jlsandri closed this Apr 6, 2026
@jlsandri jlsandri deleted the pr/codegen-vu0-vmove-vmr32-dest-mask branch April 6, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant