codegen: add dest-mask support for VU0 VMOVE and VMR32 by jlsandri · Pull Request #116 · ran-j/PS2Recomp

jlsandri · 2026-04-06T08:50:11Z

Stacked on #112 (codegen: fix VU0 dest-mask lane-bit ordering and VMR32 shuffle pattern typos). Please merge #112 first — once it lands, this PR's diff shrinks to just the single commit below.

Problem

VU0_S2_VMOVE and VU0_S2_VMR32 were emitting unconditional moves/shuffles that ignored the instruction's dest_mask field. Any masked VMOVE/VMR32 therefore overwrote lanes the mask was meant to preserve, silently corrupting state for every masked write.

Fix

Both emitters now:

Fast-path dest_mask == 0xF to the existing unconditional codegen (identical output for the common case — zero overhead).
For any other mask, emit an _mm_blendv_ps guarded by a per-lane selector built from the dest_mask bits, using the same {0x1, 0x2, 0x4, 0x8} lane-bit convention as the rest of the VU0 dest-mask emitters.

Why this is stacked on #112

VMR32's masked path emits a shuffle followed by a blend; the shuffle uses _MM_SHUFFLE(0,3,2,1), which is the corrected form from #112. Without that prerequisite merged, the masked VMR32 would rotate to the wrong lane — the same typo #112 fixes in the unmasked path.

Scope

One file: ps2xRecomp/src/lib/code_generator.cpp
+25 / -2

Rationale

Parity with the rest of the VU0 macro codegen. Every other vector instruction already respects dest_mask; VMOVE/VMR32 were outliers.

Copy-paste errors in the VU0 codegen path across ~36 instruction sites: 1. dest_mask lane-bit ordering was inverted: many emitters passed {0x8, 0x4, 0x2, 0x1} to _mm_set_epi32() instead of the correct {0x1, 0x2, 0x4, 0x8}, and the VSQD emitter had the further typo {0x1, 0x2, 0x2, 0x1}. Because _mm_set_epi32() takes arguments in reverse lane order, the X/Y mask bits were gating the Z/W lanes and vice versa, causing masked writes to silently misroute. The affected instructions include VSQD, VADD/VSUB/VMUL/VMADD/VMSUB family _Field variants, VMINI/VMAX _Field, VFTOI/VITOF family, VADD_bc/VSUB_bc/VMUL_bc/VMADD_bc/VMSUB_bc, VADDq/VSUBq/VMULq/ VMADDq/VMSUBq, VADDi/VSUBi/VMULi/VMADDi/VMSUBi, and a number of ADDAq/VMULA variants. 2. VU0 VMR32 was emitting _MM_SHUFFLE(0,0,0,1) instead of _MM_SHUFFLE(0,3,2,1), so the rotated lane produced a broadcast of lane 1 across all output lanes instead of a true 1-lane rotate. Both classes of bug are pure copy-paste / typo fixes with no runtime behavioural change for dest_mask == 0xF (the most common case), but silently wrong for any masked dest_mask.

Previously VMOVE and VMR32 emitted unconditional moves/shuffles that ignored the instruction's dest_mask field, so any masked VMOVE/VMR32 would incorrectly overwrite lanes the mask was meant to preserve. This patch: - Fast-paths dest_mask == 0xF to the existing unconditional codegen. - For any other mask, emits an _mm_blendv_ps guarded by a per-lane selector built from the dest_mask bits, matching the lane-selection convention used by the rest of the VU0 dest-mask emitters. Stacks on the VMR32 shuffle pattern fix — without that prerequisite, the masked VMR32 path would rotate to the wrong lane.

jlsandri · 2026-04-06T09:16:11Z

Closing as part of a batch cleanup after #107 landed. The runtime ecosystem refactor in #107 substantially reworked the files this PR touched, and I would like to re-audit the underlying fix against the new code structure before putting it back in front of you. If the fix is still needed after that re-audit, I will re-open as a focused PR rebased onto current main. Thanks for your patience.

jlsandri added 2 commits April 6, 2026 18:55

jlsandri force-pushed the pr/codegen-vu0-vmove-vmr32-dest-mask branch from c589c40 to 0b88fb9 Compare April 6, 2026 08:56

jlsandri closed this Apr 6, 2026

jlsandri deleted the pr/codegen-vu0-vmove-vmr32-dest-mask branch April 6, 2026 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codegen: add dest-mask support for VU0 VMOVE and VMR32#116

codegen: add dest-mask support for VU0 VMOVE and VMR32#116
jlsandri wants to merge 2 commits intoran-j:mainfrom
jlsandri:pr/codegen-vu0-vmove-vmr32-dest-mask

jlsandri commented Apr 6, 2026

Uh oh!

jlsandri commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlsandri commented Apr 6, 2026

Problem

Fix

Why this is stacked on #112

Scope

Rationale

Uh oh!

jlsandri commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant