diff --git a/.dev/checklist.md b/.dev/checklist.md
index e40a8a74e..5981ee41a 100644
--- a/.dev/checklist.md
+++ b/.dev/checklist.md
@@ -89,57 +89,71 @@ Prefix: W## (to distinguish from CW's F## items).
   `@./.dev/w47-investigation.md`. Low priority since 20 other
   benchmarks improved >10% (GC paths 40–76% faster).
 
-- [ ] W54: `tgo_strops` (and other div-heavy TinyGo workloads) is
-  ~2.1× slower than wasmtime/cranelift on M4 Pro per
-  `bench/runtime_comparison.yaml` (2026-03-25, runs=1/warmup=0):
-  zwasm cached 63.2 ms vs wasmtime cached 30.0 ms. The original
-  framing — "constant-divisor folding is missing" — was
-  **disproven** during the 2026-04-29 evening investigation. Both
-  ARM64 (`src/jit.zig:3582-3666` `tryEmitDivByConstU32`) and
-  x86_64 (`src/x86.zig`) already emit the Hacker's Delight magic
-  multiply for `i32.div_u K`; the JIT dump for
-  `bench/wasm/tgo_string_ops.wasm` shows three MOVZ+MOVK+UMULL+LSR
-  sequences for the three `i32.div_u 10` sites, with zero `UDIV`
-  instructions.
-
-  The remaining 2.1× lives in two places:
-
-  1. The 2-instruction magic-constant load (`MOVZ + MOVK` for
-     `0xCCCCCCCD`) is re-emitted inside the loop body on every
-     iteration; cranelift's SSA + GVN hoist it once so only
-     `UMULH/UMULL + LSR` stay hot. Three div sites in
-     `tgo_strops` cost ~6 ARM64 instructions per iteration that a
-     preheader hoist would eliminate.
-  2. TinyGo emits a `mov rd = rs1` per `local.set`; cranelift
-     collapses those into register renames whereas zwasm's
-     linear-scan regalloc still spills them to LDR/STR against
-     `regs[]`.
-
-  Single-pass-compatible levers, ranked:
-
-  1. **Loop-preheader magic hoist.** Extend `emitLoopPreHeader`
-     (today SIMD-only, `src/jit.zig:4604`) to scan for
-     `OP_CONST32 K → OP_DIV_U` patterns, allocate a callee-saved
-     register, and pre-load the magic. `tryEmitDivByConstU32`
-     short-circuits when the magic is already live. Risk:
-     medium — needs to coexist with the existing physical-
-     register layout (functions like `string_ops` (func#24, 13
-     vregs) saturate the callee-saved set; the prologue would
-     have to reserve a free slot up front).
-  2. **`OP_CONST32` reuse across loop back-edges.** Today
-     `known_consts` is wiped at every header. Skip unless (1)
-     lands — saves the 1-instr const itself but not the 2-instr
-     magic that hangs off it.
-  3. **`OP_MOV` coalescing in linear-scan regalloc.** Substantial
-     surgery; deserves a separate W## entry.
-
-  Out of scope (would break single-pass): SSA + dataflow,
-  global register allocation, automatic loop unroll /
-  vectorise. Re-record `runtime_comparison.yaml` at 5 runs / 3
-  warmup before claiming any number — the current values are
-  single-sample.
-
-  Full investigation log: `@./.dev/w54-investigation.md`.
+- [x] W54 (substrate): structural cleanup landed via PR #91 from
+  `develop/w54-loop-info`. Single change: `src/loop_info.zig` is
+  the single source of truth for branch / loop / vreg liveness.
+  The two JIT backends used to maintain byte-identical
+  `scanBranchTargets` implementations; both now consume
+  `LoopInfo.analyse(...)`. `vreg_first_def[]` / `vreg_last_use[]`
+  are computed in the same forward sweep, ready for future
+  consumers (Phase 5 / Phase 4). Behaviour byte-identical to
+  main (verified via `--dump-jit` diff on tgo_string_ops func#24,
+  fib func#2, and the realworld suite on Mac aarch64 + Ubuntu
+  x86_64). Architecture: D138 in `.dev/decisions.md`. Session arc:
+  `.dev/w54-redesign-postmortem.md`.
+
+- [ ] W54-coalescer: liveness-driven mov coalescing extension to
+  `regalloc.copyPropagate`. Built and proven on Mac aarch64 (50/50
+  realworld), reverted from the substrate PR after Linux x86_64
+  CI flagged a `go_math_big` divergence — BigInt subtraction
+  result `864197532086419753208641975320` (wasmtime) vs
+  `864197532160206729503480181784` (zwasm). The regalloc itself
+  is arch-agnostic, so the same `RegFunc` flows through both
+  backends; the divergence implies an x86_64-specific assumption
+  in `src/x86.zig`'s codegen that the new IR layout (fewer MOVs,
+  shifted PCs) violates. Reproducible on OrbStack's
+  `my-ubuntu-amd64` with `develop/w54-loop-pass-redesign`
+  checked out and `zig build -Doptimize=ReleaseSafe`. The
+  coalescer commit (`ec8182f` on the archive branch) cherry-picks
+  cleanly; debugging is in the x86 backend's interaction with the
+  redef-stop pattern — likely a getOrLoad / SCRATCH contention
+  triggered by a specific opcode sequence that the coalesced IR
+  produces. Recommended bisect: dump `--dump-regir` for the
+  failing function, identify the first MOV the new coalescer
+  folds that the old version kept, then dump `--dump-jit` on x86
+  to find the codegen mismatch.
+
+- [ ] W54-hoist-revisit: magic-constant loop-invariant hoist was
+  built and proven on `develop/w54-loop-pass-redesign` (archived
+  as `archive/w54-magic-hoist-2026-04-30`). digitCount JIT goes
+  196 → 192 with hoist alone. **Held back** from the substrate PR
+  pending three prerequisites:
+  1. **W47** — bench harness with σ < 5% on `tgo_strops`. The
+     hoist's effect is below the current 10% σ floor.
+  2. **W54-x86** — x86_64 `pickHoistPhys` parity. ARM64-only land
+     forces a reconciling follow-up; bundling makes one coherent
+     change.
+  3. `runtime_comparison.yaml` re-recorded at 5/3 hyperfine on a
+     thermally-stable rig.
+  Re-attempt: cherry-pick `1600397` + `c4b806e` from the archive
+  branch onto a fresh redesign branch once 1+2+3 land.
+
+- [ ] W54-x86: x86_64 magic-constant hoist parity for
+  W54-hoist-revisit. The archive branch's `src/x86.zig` already
+  has the `hoist_phys` / `hoist_displaced_infra` field
+  scaffolding; `pickHoistPhys` body needs implementing for x86's
+  reg layout (RBX/RBP/R15 vreg-bound for any reg_count >= 1; no
+  `inst_ptr_cached` slot to displace; R13/R14 free only when
+  `!has_memory`). Bench-driven decision on whether the win is
+  worth the displacement cost on functions with `has_memory`.
+
+- [ ] W54-libm: real-world `rw_c_math` (5.0× wasmtime cold,
+  8.7× cached) is dominated by BLR-heavy libm dispatch (`sin`,
+  `cos`, `pow`, `sqrt` per iteration). Out of scope for the
+  loop-pass redesign. Single-pass-compatible candidates: intrinsic
+  recognition for imported function names (`sqrt` → FSQRT inline),
+  software-libm fallback for sin/cos/pow. Needs imported-function
+  name resolution on the predecode side.
 
 - [ ] W48 Phase 2: Linux binary size 1.56 MB → 1.50 MB (~62 KB more).
   W48 Phase 1 shipped (2026-04-25): `pub const panic = std.debug.simple_panic`
diff --git a/.dev/decisions.md b/.dev/decisions.md
index 8f9faa549..c51bb0d49 100644
--- a/.dev/decisions.md
+++ b/.dev/decisions.md
@@ -941,3 +941,131 @@ Mac binary another 60 KB → cap drops 1.30 → 1.25 MB). Loosening a
 ceiling requires a CHANGELOG entry naming the regression source so
 the slack is intentional and visible.
 
+
+## D138: Shared `LoopInfo` analysis layer for the JIT pipeline
+
+**Status**: Accepted — landed via `develop/w54-loop-info` (PR #91).
+
+**Context**: For a long stretch we treated each JIT optimisation as a
+self-contained patch — the SIMD `emitLoopPreHeader`, the
+const-divisor magic-multiply fold, the adjacent-MOV `copyPropagate`,
+the `vm_ptr_cached` / `inst_ptr_cached` slots in `vregToPhys`. Each
+re-derived its own slice of "what's a loop" and "is this vreg dead"
+inline at codegen time. That ad-hoc layout was the proximate cause
+of the W54 magic-hoist abandonment on
+`develop/w54-magic-hoist-attempt` (2026-04-29 evening, see
+`.dev/w54-investigation.md`): x21 was simultaneously the inst_ptr
+cache slot for `reg_count <= 13 && has_self_call` AND the natural
+callee-saved candidate for the magic. Picking a safe boundary was a
+design call, not a tail-end commit.
+
+**Decision**: Ship the substrate first. One structural change that
+stands on its own merit and unblocks future loop-aware work:
+
+`src/loop_info.zig` is the single source of truth for the function's
+control-flow shape and per-vreg liveness. The two JIT backends used
+to maintain byte-for-byte identical `scanBranchTargets`
+implementations; both now consume `LoopInfo.analyse(allocator, ir,
+reg_count)`, which produces:
+
+- `branch_targets[]`, `loop_headers[]`, `loop_end[]` (drives JIT
+  cache eviction and the `known_consts` wipe at merge points).
+- `vreg_first_def[]`, `vreg_last_use[]` (one forward sweep,
+  conservative reads — over-approximation extends last_use later
+  than necessary, which only shrinks a future coalescer's window;
+  never breaks correctness). Phase 5+ consumers will read these.
+
+The opcode classification helpers `opWritesRd`, `opUsesRdAsSource`,
+`opUsesRs1AsSource`, `opUsesRs2AsSource` live in `loop_info.zig`
+for now (private) — they will be promoted to public regalloc API
+once the coalescer extension that needs them is debugged on x86_64
+(see W54-coalescer in checklist.md).
+
+**Effect (ARM64 Mac)**:
+
+- Both backends drop ~60 lines of duplicated `scanBranchTargets`
+  body in favour of a thin `LoopInfo.analyse` call. Behaviour is
+  byte-deterministic identical to main: `--dump-jit=24` of
+  `tgo_string_ops` func#24 (digitCount, 196 ARM64 instrs / 784
+  bytes) matches main bit-for-bit.
+- All other functions emit byte-identical machine code. No
+  performance change is expected or observed (Phase 0 + 1 are pure
+  refactoring; the data is computed but no codegen consumer reads
+  the new `vreg_first_def[]` / `vreg_last_use[]` arrays yet).
+
+**Rejected alternatives** (and why they didn't ride along):
+
+- **Liveness-driven mov coalescing in `regalloc.copyPropagate`**.
+  Implemented and shipped on the develop branch, then **reverted
+  on 2026-04-30** after the green Mac gate but RED Linux x86_64 CI
+  on `go_math_big`: the new "stop at first redef of old_reg" scan
+  with branch-target / forward-branch / multi-source bail-outs
+  passes the Mac aarch64 realworld suite (50/50, including
+  `rust_regex` which the first attempt broke), but produces wrong
+  results on Linux x86_64's `go_math_big` (BigInt subtraction
+  mismatch — wasmtime returns
+  `864197532086419753208641975320`, zwasm returns
+  `864197532160206729503480181784`). The regalloc itself is
+  arch-agnostic, so the same `RegFunc` flows through both
+  backends; the divergence implies an x86_64-specific assumption
+  in `src/x86.zig`'s codegen that the new IR layout violates.
+  Diagnosis is bench/CI-bound and not a tail-end fix; tracked as
+  W54-coalescer for a focused follow-up.
+
+- **Magic-constant loop-invariant hoist** (`OP_CONST32 K →
+  OP_DIV_U` pattern, materialise the magic into a callee-saved
+  register in the prologue, short-circuit
+  `tryEmitDivByConstU32`). Implemented on
+  `develop/w54-loop-pass-redesign` (commits `1600397`, `c4b806e`)
+  and proved out: digitCount JIT 196 → 192 with hoist alone, 192
+  → 185 stacked with the (eventually reverted) coalescer. Held
+  back from this PR for three reasons:
+  1. The runtime gain is below the bench σ floor today; without
+     the W47 harness work the optimisation would land
+     evidence-free and any later regression would be argued as
+     noise rather than measured.
+  2. The hoist requires displacing `inst_ptr_cached` (x21) on
+     functions with reg_count >= 5 + has_self_call — an
+     ARM64-specific behaviour change with no measured benefit
+     today. Pushing it post-harness keeps the trade-off
+     reviewable.
+  3. x86_64 parity has different free-slot mechanics (no
+     `inst_ptr_cached` to displace). Bundling hoist with parity
+     makes one coherent change later.
+
+- **Loop-invariant `known_consts` survival across loop headers**.
+  Sketched as Phase 4 of the original plan; dropped after
+  inspection of digitCount's RegIR showed every `i32.div_u 10` site
+  has its CONST32 emitted *inside* the loop body (TinyGo reuses
+  the same vreg `r8` / `r9` / `r12` per div), so the optimisation
+  would never fire on the W54 target.
+
+**Affected files**: `src/loop_info.zig` (new), `src/jit.zig`
+(replace `scanBranchTargets` with `LoopInfo.analyse`), `src/x86.zig`
+(same).
+
+**Archive**: the magic-hoist + coalescer work is preserved on
+`develop/w54-loop-pass-redesign` (last commit `a56d442`) and tagged
+`archive/w54-magic-hoist-2026-04-30`. Cherry-pick path:
+- `1600397` + `c4b806e` for the ARM64 magic hoist (re-attempt
+  W54-hoist-revisit).
+- `ec8182f` for the redef-aware coalescer (re-attempt W54-coalescer
+  after diagnosing the x86_64 `go_math_big` regression).
+
+**Re-evaluation pre-conditions**:
+1. **W47** — bench harness with σ < 5% on tgo_strops (currently ~10%)
+2. **W54-x86** — symmetric `pickHoistPhys` for x86_64 reg layout
+3. **W54-coalescer** — diagnose and fix the x86_64 `go_math_big`
+   divergence; the diff is in the regalloc-stage IR shape, the
+   backend assumption that breaks is in `src/x86.zig`.
+4. `runtime_comparison.yaml` re-recorded with 5/3 hyperfine on a
+   thermally-stable rig.
+
+**Follow-ups** (open W## items in checklist.md):
+- `W54-coalescer`: diagnose the x86_64 `go_math_big` regression,
+  re-land the coalescer.
+- `W54-hoist-revisit`: revive the magic-hoist work once W47 +
+  W54-x86 are ready.
+- `W54-libm`: `rw_c_math` is dominated by libm `sin`/`cos`/`pow`
+  dispatch; intrinsic recognition + ARM64 FSQRT inline + soft-libm
+  fallback.
diff --git a/.dev/memo.md b/.dev/memo.md
index 0f5a66bb0..e0505dc59 100644
--- a/.dev/memo.md
+++ b/.dev/memo.md
@@ -23,7 +23,41 @@ Session handover document. Read at session start.
 
 ## Current Task
 
-**W53 done. C-g foundation + Mac/Ubuntu baselines done.** Ship-overnight
+**W54 substrate landed via PR #91 from `develop/w54-loop-info`** (2026-04-30).
+Single structural change: `src/loop_info.zig` is the single source of
+truth for branch / loop / vreg liveness. Both backends drop ~60 lines
+of byte-identical `scanBranchTargets` in favour of a thin
+`LoopInfo.analyse(...)` call. `vreg_first_def[]` /
+`vreg_last_use[]` are computed from the same forward sweep, ready
+for future consumers. JIT output is byte-identical to main on every
+function (verified via `--dump-jit` diff for tgo_string_ops func#24
+and fib func#2).
+
+### Held back (archive branch)
+
+`develop/w54-loop-pass-redesign` (tagged
+`archive/w54-magic-hoist-2026-04-30`) preserves two further pieces
+of work that were built and bench-validated, but held back:
+
+1. **Magic-constant loop-invariant hoist** (commits `1600397`,
+   `c4b806e`). digitCount JIT 196 → 192. Re-attempt prerequisites:
+   W47 (bench harness σ < 5%), W54-x86 (parity).
+2. **Liveness-driven mov coalescing** (commit `ec8182f`). digitCount
+   JIT 192 → 185 stacked on hoist; substrate-only branch JIT 196 →
+   189 with just the coalescer. **Reverted from PR #91 after
+   Linux x86_64 CI failed `go_math_big`** (BigInt subtraction
+   divergence — wasmtime returns `864197532086419753208641975320`,
+   zwasm returns `864197532160206729503480181784`). The regalloc
+   is arch-agnostic, so the divergence is in `src/x86.zig`'s
+   codegen interaction with the new IR layout. Reproducible on
+   OrbStack `my-ubuntu-amd64`. Tracked as W54-coalescer.
+
+Architecture rationale: D138 in decisions.md. Detailed session arc
++ branch names: `.dev/w54-redesign-postmortem.md`.
+
+### Previous (still on main)
+
+**C-g foundation + Mac/Ubuntu baselines done.** Ship-overnight
 session 2026-04-29 evening landed two PRs to main on top of the
 afternoon's six (#79..#84):
 
diff --git a/.dev/w54-redesign-postmortem.md b/.dev/w54-redesign-postmortem.md
new file mode 100644
index 000000000..6600835c6
--- /dev/null
+++ b/.dev/w54-redesign-postmortem.md
@@ -0,0 +1,219 @@
+# W54 redesign — what shipped, what archived, what to do next
+
+Captured 2026-04-30 after the deep redesign session, updated after
+PR #91's first CI surfaced an x86_64-only regression on the
+coalescer.
+
+## TL;DR
+
+- **Shipped** (`develop/w54-loop-info` → main, PR #91): the LoopInfo
+  substrate only. `src/loop_info.zig` shared analysis layer with
+  branch_targets / loop_headers / loop_end + per-vreg first_def /
+  last_use. Both backends consume it instead of duplicating
+  `scanBranchTargets`. Behaviour byte-identical to main on every
+  benchmark we dump-jit'd.
+- **Archived** (`develop/w54-loop-pass-redesign`, tagged
+  `archive/w54-magic-hoist-2026-04-30`):
+  - Magic-constant loop-invariant hoist with `inst_ptr_cached`
+    displacement. digitCount JIT 196 → 192. Held back pending
+    W47 + W54-x86.
+  - Liveness-driven mov coalescing extension to
+    `regalloc.copyPropagate`. digitCount JIT 196 → 189. Reverted
+    from the substrate PR after Linux x86_64 CI flagged a
+    `go_math_big` BigInt divergence; tracked as W54-coalescer.
+- **Dropped**: Phase 4 ("loop-invariant `known_consts` survival
+  across loop headers"). The W54 target — digitCount — emits
+  CONST32 *inside* the loop body for every divisor site, so the
+  optimisation never fires on it. Re-evaluate when a benchmark
+  with the defined-outside-loop pattern shows up.
+
+## Session arc
+
+The starting point: PR #90 captured the W54 investigation
+(`.dev/w54-investigation.md`) which disproved the original framing
+("zwasm doesn't fold i32.div_u K"). zwasm already emits the
+Hacker's Delight magic-multiply for constant divisors on both
+arches; the 2.4× wasmtime gap on `tgo_strops` lives in two places
+— magic constants re-loaded every iter, and TinyGo's mov-heavy
+`local.set` chains.
+
+The first attempt at the magic hoist
+(`develop/w54-magic-hoist-attempt`, abandoned the same evening)
+hit a register collision: x21 was simultaneously the inst_ptr cache
+for `reg_count <= 13 && has_self_call` AND the natural callee-saved
+candidate for the magic. The investigation captured in
+`.dev/w54-investigation.md` concluded that picking a safe boundary
+was a design call, not a tail-end commit on a long autonomous run.
+
+The redesign session (this one, 2026-04-29 → 2026-04-30) did the
+design work the abandoned attempt avoided:
+
+1. Built the substrate (LoopInfo + opcode helpers + liveness data).
+2. Built the magic hoist on top, with `pickHoistPhys` that
+   displaces inst_ptr_cached when needed.
+3. Built the liveness-driven coalescer.
+4. Discovered via Mac bench that the runtime gain of (2)+(3) is
+   below the σ ≈ 10% noise floor on tgo_strops (W47).
+5. Reduced scope to just the substrate (1) for the PR. Tested.
+   Mac green. Pushed.
+6. Linux x86_64 CI flagged `go_math_big` regression on the
+   coalescer (3). Reproduced on OrbStack `my-ubuntu-amd64`.
+7. Reverted (3) from the PR. Re-pushed substrate-only.
+8. Mac aarch64 native testing of (3) had passed; the bug is
+   x86-specific. Tracked as W54-coalescer for diagnosis.
+
+## Branches and commits
+
+```
+main (pre-redesign)
+ └── develop/w54-magic-hoist-attempt   abandoned 2026-04-29 evening
+ │   reason: x21 register collision, deferred for daylight design
+ │
+ └── develop/w54-loop-pass-redesign    archived 2026-04-30
+ │   tag: archive/w54-magic-hoist-2026-04-30
+ │   contents (7 commits):
+ │     dd450f5  redesign plan (.dev/w54-redesign-plan.md)
+ │     b65477a  Phase 0  scanBranchTargets → LoopInfo
+ │     98287ae  Phase 1  vreg liveness on LoopInfo
+ │     1600397  Phase 2  hoist_phys / hoist_displaced_inst_ptr scaffold
+ │     c4b806e  Phase 3  ARM64 magic-constant hoist
+ │     ec8182f  Phase 5  liveness-driven mov coalescing
+ │                       (Mac green, x86_64 fails go_math_big)
+ │     a56d442  Phase 6  D138 + checklist + memo + bench record
+ │
+ └── develop/w54-loop-info             shipped 2026-04-30 (PR #91)
+     contents (3 commits, cherry-picked from the archive):
+       ee10661  Phase 0  scanBranchTargets → LoopInfo
+       ac2d446  Phase 1  vreg liveness on LoopInfo
+       <docs>   D138 + checklist + memo + this postmortem
+```
+
+## Why the coalescer was reverted from PR #91
+
+The Phase 5 commit (`ec8182f`) extended `regalloc.copyPropagate` to
+fold a temp-to-local MOV when the temp is killed (redefined) before
+any later read — an O(N) bounded scan that stops at the first
+redefinition of `old_reg`, with bail-outs for branch targets,
+forward branches, and multi-source ops.
+
+On Mac aarch64 this passed:
+- 412/412 unit tests
+- spec / e2e / FFI / minimal builds
+- 50/50 realworld (including `rust_regex` which the first attempt
+  broke — the forward-branch bail caught that case)
+
+On Linux x86_64 CI it failed:
+- realworld 49/50: `go_math_big` DIFF
+- wasmtime: `diff: 864197532086419753208641975320`
+- zwasm:    `diff: 864197532160206729503480181784`
+
+Reproduced on OrbStack `my-ubuntu-amd64` (native x86_64, not
+Rosetta). This means the coalesced `RegFunc` (which is identical
+across both backends — regalloc is arch-agnostic) gets correctly
+emitted on ARM64 but mis-emitted on x86_64. The bug is in
+`src/x86.zig`'s codegen interaction with the new IR layout (fewer
+MOVs, shifted PCs).
+
+This rules out "the coalescer is wrong" — Mac aarch64 passes 50/50
+on the same `RegFunc`. It points at an x86-specific assumption
+the new layout violates: likely a getOrLoad / SCRATCH contention
+or a spill-around-call sequence whose timing depends on a MOV
+that the new coalescer eliminates.
+
+Diagnosis path (W54-coalescer):
+1. `--dump-regir` for `go_math_big`'s offending function on both
+   the coalescer branch and main; identify the first MOV that the
+   new fold removed.
+2. `--dump-jit=...` for that function on x86_64 main vs branch;
+   find the codegen difference.
+3. Check that x86's getOrLoad caching, scratch_vreg invalidation
+   on UMULL, and call-site reload loops correctly handle the new
+   IR shape.
+4. Add a regression test (the failing IR pattern, ideally a
+   minimal wat).
+
+## Why the hoist was held back from PR #91
+
+The Phase 3 commits (`1600397` + `c4b806e`) implement the magic
+hoist. ARM64 dump-jit shows the win: digitCount 196 → 192 with
+hoist alone. Stacked with the (now-reverted) coalescer: 192 →
+185.
+
+Three reasons it didn't ride along with the substrate:
+
+1. **W47**: the bench σ on `tgo_strops` is ~10%. The hoist's
+   wall-clock effect is below the noise floor. Without harness
+   improvement the win is unfalsifiable; landing it now would mean
+   any later regression is argued as noise rather than measured.
+
+2. **`inst_ptr_cached` displacement**: when no callee-saved slot
+   is free (digitCount has reg_count=13 + self_call which
+   saturates), the hoist takes x21 from the inst_ptr cache. Every
+   `emitLoadInstPtr` site becomes a memory load. ARM64-specific
+   behaviour change, no measured benefit today.
+
+3. **W54-x86**: x86_64 has different free-slot mechanics. Land
+   ARM64 alone and the next x86 PR has to reconcile two arches.
+   Bundling makes one coherent change later.
+
+When W47 + W54-x86 + W54-coalescer are all green, the path is:
+checkout `archive/w54-magic-hoist-2026-04-30`, cherry-pick
+`1600397` + `c4b806e` (hoist) + `ec8182f` (coalescer, after
+diagnosing the go_math_big regression).
+
+## Lessons / signals to remember
+
+- **Linux x86_64 CI is irreplaceable for arch-asymmetric
+  regressions.** Mac aarch64 + OrbStack x86_64 (Rosetta) both
+  pass; only the GitHub-hosted native x86_64 runner caught
+  go_math_big. OrbStack's "amd64" via Rosetta is x86-emulated on
+  ARM Mac and somehow doesn't trigger the same codegen path the
+  CI runner does. **The Mac-only "Mac green ⇒ ship" heuristic
+  is unsafe** — Linux CI is non-redundant.
+
+  Update 2026-04-30: confirmed reproducible on OrbStack with a
+  fresh build (`zig build -Doptimize=ReleaseSafe` in the VM, not
+  cross-compiled from Mac). The earlier "OrbStack passes" reading
+  was a stale Mach-O binary that wasn't actually executed —
+  OrbStack Linux can't run aarch64-darwin Mach-O, so the test
+  fell through to wasmtime's output.
+
+- **Regalloc-stage IR changes are arch-agnostic, but JIT
+  consumption isn't.** A new `RegFunc` shape that's correct by
+  construction can still expose existing backend bugs (or
+  undocumented backend assumptions). Both backends need to be
+  exercised before claiming a regalloc-stage refactor is
+  behaviour-neutral.
+
+- **Bench σ ≈ 10% on `tgo_strops`** (W47) is the gating
+  constraint for measuring small JIT optimisations. Until W47,
+  sub-10% wins are unfalsifiable.
+
+- **Forward branches are the safety boundary for redef-stop
+  coalescing.** The `rust_regex` `/h.l+o/ ~ "hallo"` failure was
+  exactly the "branch over a redef" pattern — without dominator
+  info, every forward branch in the scan window has to be a bail.
+  The x86_64 `go_math_big` failure is a different class — same
+  RegFunc, but the x86 backend mis-emits.
+
+- **Phase 4 (invariant `known_consts` across loop headers) does
+  not fire on the W54 target**. digitCount's CONST32 is reborn
+  per iteration. Verify-on-RegIR is cheaper than
+  implement-and-bench.
+
+- **`develop/w54-magic-hoist-attempt` was right to defer**. The
+  collision class (x21 = inst_ptr_cache vs hoist) was a missing
+  layer in the JIT. The substrate added the layer; the
+  consequential optimisations stack on top.
+
+## Pointers
+
+- Architecture: D138 (`.dev/decisions.md`).
+- Investigation log: `.dev/w54-investigation.md` (in main, PR #90).
+- Original plan: `.dev/w54-redesign-plan.md` (on the archive
+  branch only — the shipped scope is much narrower than the plan).
+- Bench harness work: `W47` in `.dev/checklist.md`.
+- Coalescer re-attempt: `W54-coalescer` in `.dev/checklist.md`.
+  Diagnose via `--dump-regir` / `--dump-jit` on x86_64 first.
+- Hoist re-attempt: `W54-hoist-revisit` + `W54-x86`. Cherry-pick
+  `1600397` + `c4b806e` from `archive/w54-magic-hoist-2026-04-30`.
diff --git a/src/jit.zig b/src/jit.zig
index 5be48b801..97a880373 100644
--- a/src/jit.zig
+++ b/src/jit.zig
@@ -34,6 +34,8 @@ const WasmMemory = @import("memory.zig").Memory;
 const trace_mod = @import("trace.zig");
 const predecode_mod = @import("predecode.zig");
 const platform = @import("platform.zig");
+const loop_info_mod = @import("loop_info.zig");
+const LoopInfo = loop_info_mod.LoopInfo;
 
 /// JIT-compiled function pointer type.
 /// Args: regs_ptr, vm_ptr, instance_ptr.
@@ -1154,13 +1156,11 @@ pub const Compiler = struct {
     self_call_entry_idx: u32,
     /// Saved fast-path pattern from emitBaseCaseFastPath for duplication at self-call entry.
     fast_path_info: ?FastPathInfo,
-    /// IR slice and branch targets for peephole fusion (set during compile).
+    /// IR slice for peephole fusion (set during compile).
     ir_slice: []const RegInstr = &.{},
-    branch_targets_slice: []bool = &.{},
-    /// Loop header markers: true for PCs that are targets of backward branches.
-    loop_headers_slice: []bool = &.{},
-    /// For each loop header PC, the max back-edge source PC (defines loop body range).
-    loop_end_map: []u32 = &.{},
+    /// Branch / loop / liveness analysis owned by the compile, freed at end.
+    /// Populated by `LoopInfo.analyse` at the start of `compileMain`.
+    loop_info: LoopInfo = .{},
 
     const FastPathInfo = struct {
         param_offset: u16,
@@ -2431,76 +2431,10 @@ pub const Compiler = struct {
         return false;
     }
 
-    /// Pre-scan IR to find all branch targets (PCs that can be jumped to).
-    /// Also populates loop_headers_slice and loop_end_map for SIMD loop persistence.
-    fn scanBranchTargets(self: *Compiler, ir: []const RegInstr) ?[]bool {
-        const targets = self.alloc.alloc(bool, ir.len) catch return null;
-        @memset(targets, false);
-        const loop_headers = self.alloc.alloc(bool, ir.len) catch {
-            self.alloc.free(targets);
-            return null;
-        };
-        @memset(loop_headers, false);
-        const loop_end = self.alloc.alloc(u32, ir.len) catch {
-            self.alloc.free(loop_headers);
-            self.alloc.free(targets);
-            return null;
-        };
-        @memset(loop_end, 0);
-
-        var scan_pc: u32 = 0;
-        while (scan_pc < ir.len) {
-            const instr = ir[scan_pc];
-            const source_pc = scan_pc;
-            scan_pc += 1;
-            switch (instr.op) {
-                regalloc_mod.OP_BR => {
-                    if (instr.operand < ir.len) {
-                        targets[instr.operand] = true;
-                        // Back-edge: target < source → loop header
-                        if (instr.operand <= source_pc) {
-                            loop_headers[instr.operand] = true;
-                            if (source_pc > loop_end[instr.operand])
-                                loop_end[instr.operand] = source_pc;
-                        }
-                    }
-                },
-                regalloc_mod.OP_BR_IF, regalloc_mod.OP_BR_IF_NOT => {
-                    if (instr.operand < ir.len) {
-                        targets[instr.operand] = true;
-                        if (instr.operand <= source_pc) {
-                            loop_headers[instr.operand] = true;
-                            if (source_pc > loop_end[instr.operand])
-                                loop_end[instr.operand] = source_pc;
-                        }
-                    }
-                },
-                regalloc_mod.OP_BR_TABLE => {
-                    const count = instr.operand;
-                    var i: u32 = 0;
-                    while (i < count + 1 and scan_pc < ir.len) : (i += 1) {
-                        const entry = ir[scan_pc];
-                        scan_pc += 1;
-                        if (entry.operand < ir.len) {
-                            targets[entry.operand] = true;
-                            if (entry.operand <= source_pc) {
-                                loop_headers[entry.operand] = true;
-                                if (source_pc > loop_end[entry.operand])
-                                    loop_end[entry.operand] = source_pc;
-                            }
-                        }
-                    }
-                },
-                regalloc_mod.OP_BLOCK_END => {
-                    targets[scan_pc - 1] = true;
-                },
-                else => {},
-            }
-        }
-        self.loop_headers_slice = loop_headers;
-        self.loop_end_map = loop_end;
-        return targets;
-    }
+    // scanBranchTargets has moved to src/loop_info.zig — both backends now
+    // call self.loop_info.analyse() in compileMain. The body lives there so
+    // the analysis can be enriched (liveness, invariant-const classification)
+    // without diverging the two arches.
 
     fn isControlFlowOp(_: *const Compiler, op: u16) bool {
         return switch (op) {
@@ -2606,15 +2540,13 @@ pub const Compiler = struct {
         // Pre-allocate pc_map indexed by RegInstr PC (not loop iteration)
         self.pc_map.appendNTimes(self.alloc, 0, ir.len + 1) catch return null;
 
-        // Pre-scan: find branch targets for known_consts invalidation
-        const branch_targets = self.scanBranchTargets(ir) orelse return null;
-        defer self.alloc.free(branch_targets);
-        defer self.alloc.free(self.loop_headers_slice);
-        defer self.alloc.free(self.loop_end_map);
+        // Pre-scan: branch targets, loop headers, loop body extents.
+        if (!self.loop_info.analyse(self.alloc, ir, self.reg_count)) return null;
+        defer self.loop_info.deinit(self.alloc);
+        const branch_targets = self.loop_info.branch_targets;
 
-        // Store IR and branch targets for peephole fusion
+        // Store IR for peephole fusion (loop_info is read directly via self).
         self.ir_slice = ir;
-        self.branch_targets_slice = branch_targets;
 
         // Detect SIMD presence for v128 sync in MOV/CONST
         for (ir) |scan_instr| {
@@ -2658,7 +2590,7 @@ pub const Compiler = struct {
             if (pc < branch_targets.len and branch_targets[pc]) {
                 self.known_consts = .{null} ** 128;
                 self.scratch_vreg = null;
-                if (pc < self.loop_headers_slice.len and self.loop_headers_slice[pc]) {
+                if (pc < self.loop_info.loop_headers.len and self.loop_info.loop_headers[pc]) {
                     // Loop header: emit pre-loads BEFORE pc_map (first iteration only),
                     // then keep Q-reg cache alive. Back-edges jump to pc_map (after pre-loads).
                     self.fpCacheEvictAll();
@@ -2774,7 +2706,7 @@ pub const Compiler = struct {
             regalloc_mod.OP_BR => {
                 const target = instr.operand;
                 const is_back_edge = target <= pc.* - 1;
-                if (is_back_edge and target < self.loop_headers_slice.len and self.loop_headers_slice[target]) {
+                if (is_back_edge and target < self.loop_info.loop_headers.len and self.loop_info.loop_headers[target]) {
                     // Back-edge to loop header: flush Q-regs (keep cache) for deopt safety
                     self.fpCacheEvictAll();
                     self.simdQregFlushAll();
@@ -3453,7 +3385,7 @@ pub const Compiler = struct {
         if (next.op != regalloc_mod.OP_BR_IF and next.op != regalloc_mod.OP_BR_IF_NOT) return false;
         if (next.rd != rd) return false;
         // Don't fuse if the BR_IF is a branch target (merge point)
-        if (pc.* < self.branch_targets_slice.len and self.branch_targets_slice[pc.*]) return false;
+        if (pc.* < self.loop_info.branch_targets.len and self.loop_info.branch_targets[pc.*]) return false;
 
         // Fuse: emit B.cond instead of CSET + CBNZ/CBZ
         self.evictAllCaches();
@@ -4602,7 +4534,7 @@ pub const Compiler = struct {
     /// Called BEFORE recording pc_map so back-edges skip pre-loads (only first iteration loads).
     /// Sets up Q-reg cache entries so the loop body finds inputs already cached.
     fn emitLoopPreHeader(self: *Compiler, ir: []const RegInstr, header_pc: u32) void {
-        const end_pc = if (header_pc < self.loop_end_map.len) self.loop_end_map[header_pc] else return;
+        const end_pc = if (header_pc < self.loop_info.loop_end.len) self.loop_info.loop_end[header_pc] else return;
         if (end_pc == 0) return;
 
         // Scan loop body to find v128 vregs that are read (as SIMD op inputs)
diff --git a/src/loop_info.zig b/src/loop_info.zig
new file mode 100644
index 000000000..b15934dc3
--- /dev/null
+++ b/src/loop_info.zig
@@ -0,0 +1,448 @@
+//! Shared loop / branch / liveness analysis for the JIT pipeline.
+//!
+//! Owns the data structures that describe a function's control-flow
+//! shape (where the branch targets are, which PCs are loop headers,
+//! how long each loop body extends) plus per-vreg liveness (first
+//! definition, last use). Phase 4+ extends this with classification of
+//! loop-invariant constants.
+//!
+//! Both JIT backends consume the same `LoopInfo` instead of running
+//! their own pre-scans. Cost: one forward sweep over the RegInstr
+//! stream per compile (control-flow sweep + liveness sweep are fused).
+
+const std = @import("std");
+const regalloc = @import("regalloc.zig");
+
+const RegInstr = regalloc.RegInstr;
+
+/// Sentinel used by `vreg_first_def[v]` to mean "vreg v is never written
+/// inside the function body". Callers wishing to ask "is v defined before
+/// PC X?" should treat NEVER_DEFINED as "no, not defined here". Params
+/// and locals are conceptually defined at function entry but their
+/// definition is implicit (no RegInstr writes them) — Phase 4 handles
+/// that distinction by also treating `v < local_count` as defined-before-loop.
+pub const NEVER_DEFINED: u32 = std.math.maxInt(u32);
+
+pub const LoopInfo = struct {
+    /// branch_targets[pc] = true iff some control-flow op (BR, BR_IF,
+    /// BR_IF_NOT, BR_TABLE, BLOCK_END) targets this PC. Drives JIT
+    /// cache eviction and the known_consts wipe.
+    branch_targets: []bool = &.{},
+
+    /// loop_headers[pc] = true iff `pc` is the target of a backward
+    /// branch (i.e. a loop entry).
+    loop_headers: []bool = &.{},
+
+    /// loop_end[header_pc] = max source PC of any back-edge into
+    /// header_pc. Defines the inclusive range `[header_pc, loop_end]`
+    /// that the loop body covers. 0 for non-headers.
+    loop_end: []u32 = &.{},
+
+    /// vreg_first_def[v] = PC of the first RegInstr that writes vreg v,
+    /// or `NEVER_DEFINED` if v is never assigned by any instruction in
+    /// this function body. "Write" here is dataflow-correct: stores
+    /// (0x36..0x3E), conditional branches (BR_IF / BR_IF_NOT) and
+    /// RETURN treat rd as a SOURCE, not a destination, and are ignored.
+    vreg_first_def: []u32 = &.{},
+
+    /// vreg_last_use[v] = PC of the last RegInstr that reads vreg v
+    /// (rs1, rs2_field, or rd-as-source for stores / conditional
+    /// branches / RETURN). 0 if v is never read.
+    /// Conservative: opcodes that don't actually consume rs1/rs2 (BR,
+    /// CONST32, CONST64, BLOCK_END, NOP, DELETED) are excluded;
+    /// everything else treats both rs1 and rs2_field as a read. The
+    /// over-approximation extends last_use later than necessary, which
+    /// only shrinks the coalescing window in Phase 5 — safe by design.
+    vreg_last_use: []u32 = &.{},
+
+    /// Number of vregs the liveness arrays cover. Equals reg_func.reg_count
+    /// at analyse() time. Used for bounds checks in callers.
+    vreg_count: u32 = 0,
+
+    /// Free all owned slices. Safe to call on a default-initialized
+    /// (empty) LoopInfo.
+    pub fn deinit(self: *LoopInfo, alloc: std.mem.Allocator) void {
+        if (self.branch_targets.len > 0) alloc.free(self.branch_targets);
+        if (self.loop_headers.len > 0) alloc.free(self.loop_headers);
+        if (self.loop_end.len > 0) alloc.free(self.loop_end);
+        if (self.vreg_first_def.len > 0) alloc.free(self.vreg_first_def);
+        if (self.vreg_last_use.len > 0) alloc.free(self.vreg_last_use);
+        self.* = .{};
+    }
+
+    /// Single forward sweep populating branch_targets / loop_headers /
+    /// loop_end and per-vreg first_def / last_use. Returns false on
+    /// allocation failure (caller treats the JIT compile as a bail).
+    pub fn analyse(
+        self: *LoopInfo,
+        alloc: std.mem.Allocator,
+        ir: []const RegInstr,
+        reg_count: u32,
+    ) bool {
+        const targets = alloc.alloc(bool, ir.len) catch return false;
+        @memset(targets, false);
+        const loop_headers = alloc.alloc(bool, ir.len) catch {
+            alloc.free(targets);
+            return false;
+        };
+        @memset(loop_headers, false);
+        const loop_end = alloc.alloc(u32, ir.len) catch {
+            alloc.free(loop_headers);
+            alloc.free(targets);
+            return false;
+        };
+        @memset(loop_end, 0);
+
+        const first_def = alloc.alloc(u32, reg_count) catch {
+            alloc.free(loop_end);
+            alloc.free(loop_headers);
+            alloc.free(targets);
+            return false;
+        };
+        @memset(first_def, NEVER_DEFINED);
+        const last_use = alloc.alloc(u32, reg_count) catch {
+            alloc.free(first_def);
+            alloc.free(loop_end);
+            alloc.free(loop_headers);
+            alloc.free(targets);
+            return false;
+        };
+        @memset(last_use, 0);
+
+        var scan_pc: u32 = 0;
+        while (scan_pc < ir.len) {
+            const instr = ir[scan_pc];
+            const source_pc = scan_pc;
+            scan_pc += 1;
+
+            // --- Control-flow shape ---
+            switch (instr.op) {
+                regalloc.OP_BR => recordTarget(targets, loop_headers, loop_end, instr.operand, source_pc, ir.len),
+                regalloc.OP_BR_IF, regalloc.OP_BR_IF_NOT => recordTarget(
+                    targets,
+                    loop_headers,
+                    loop_end,
+                    instr.operand,
+                    source_pc,
+                    ir.len,
+                ),
+                regalloc.OP_BR_TABLE => {
+                    const count = instr.operand;
+                    var i: u32 = 0;
+                    while (i < count + 1 and scan_pc < ir.len) : (i += 1) {
+                        const entry = ir[scan_pc];
+                        scan_pc += 1;
+                        recordTarget(targets, loop_headers, loop_end, entry.operand, source_pc, ir.len);
+                        // BR_TABLE follow-up NOPs participate in liveness too:
+                        // their operand is a target PC, not a vreg, so we skip
+                        // their rs1/rs2_field/rd entries.
+                    }
+                },
+                regalloc.OP_BLOCK_END => {
+                    targets[scan_pc - 1] = true;
+                },
+                else => {},
+            }
+
+            // --- Liveness ---
+            //
+            // Update last_use BEFORE first_def so a `mov rd = rs1` that
+            // happens to have rd == rs1 (degenerate, but legal) records
+            // the read at PC and the write at PC. That's correct: the
+            // value is read AT this PC and written AT this PC.
+
+            if (opUsesRdAsSource(instr.op)) {
+                if (instr.rd < reg_count) last_use[instr.rd] = source_pc;
+            }
+            if (opUsesRs1AsSource(instr.op)) {
+                if (instr.rs1 < reg_count) last_use[instr.rs1] = source_pc;
+            }
+            if (opUsesRs2AsSource(instr.op)) {
+                const r2 = instr.rs2();
+                if (r2 < reg_count) last_use[r2] = source_pc;
+            }
+            // Multi-source ops (CALL / CALL_INDIRECT / RETURN_MULTI /
+            // memory.fill / memory.copy) read additional vregs that live
+            // in the operand field as a count + following NOP slots, or
+            // in special positions. Phase 1 conservatively treats them
+            // via the rs1/rs2 fields above (over-approximation only loses
+            // optimization in Phase 5; never hurts correctness).
+
+            if (opWritesRd(instr.op)) {
+                if (instr.rd < reg_count and first_def[instr.rd] == NEVER_DEFINED) {
+                    first_def[instr.rd] = source_pc;
+                }
+            }
+        }
+
+        self.* = .{
+            .branch_targets = targets,
+            .loop_headers = loop_headers,
+            .loop_end = loop_end,
+            .vreg_first_def = first_def,
+            .vreg_last_use = last_use,
+            .vreg_count = reg_count,
+        };
+        return true;
+    }
+};
+
+fn recordTarget(
+    targets: []bool,
+    loop_headers: []bool,
+    loop_end: []u32,
+    target_pc: u32,
+    source_pc: u32,
+    ir_len: usize,
+) void {
+    if (target_pc >= ir_len) return;
+    targets[target_pc] = true;
+    if (target_pc <= source_pc) {
+        loop_headers[target_pc] = true;
+        if (source_pc > loop_end[target_pc]) {
+            loop_end[target_pc] = source_pc;
+        }
+    }
+}
+
+/// True iff this opcode writes a fresh value into rd (the destination
+/// vreg). Stores treat rd as a value source; conditional branches
+/// (BR_IF / BR_IF_NOT / RETURN) treat rd as the read condition or
+/// returned value; control-flow ops without vregs return false.
+fn opWritesRd(op: u16) bool {
+    return switch (op) {
+        regalloc.OP_BR,
+        regalloc.OP_BR_IF,
+        regalloc.OP_BR_IF_NOT,
+        regalloc.OP_BR_TABLE,
+        regalloc.OP_BLOCK_END,
+        regalloc.OP_NOP,
+        regalloc.OP_DELETED,
+        regalloc.OP_RETURN,
+        regalloc.OP_RETURN_VOID,
+        regalloc.OP_RETURN_MULTI,
+        regalloc.OP_MEMORY_FILL,
+        regalloc.OP_MEMORY_COPY,
+        // Wasm stores: rd is the value source, rs1 is the address.
+        0x36, 0x37, 0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E => false,
+        else => true,
+    };
+}
+
+/// True iff this opcode treats rd as a SOURCE (read) rather than a
+/// destination. The set is symmetric with `!opWritesRd` for the cases
+/// where rd carries a vreg reference at all — control-flow ops with no
+/// vreg use return false in both predicates.
+fn opUsesRdAsSource(op: u16) bool {
+    return switch (op) {
+        regalloc.OP_BR_IF,
+        regalloc.OP_BR_IF_NOT,
+        regalloc.OP_RETURN,
+        regalloc.OP_RETURN_MULTI,
+        // Wasm stores: rd is the value source.
+        0x36, 0x37, 0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E => true,
+        // memory.fill / memory.copy use rs1/rs2/operand-NOP entries —
+        // see fuzz_gen / vm.zig for the layout. Phase 1 doesn't model
+        // their additional sources; Phase 5 will not coalesce around
+        // them anyway, so the omission is harmless.
+        else => false,
+    };
+}
+
+/// True iff this opcode reads rs1 as a vreg. Default-true for most ops
+/// (binary / unary / load / mov), false for control-flow / const /
+/// no-vreg ops where rs1 is unused (and defaults to 0, which would
+/// otherwise spuriously update last_use[0]).
+fn opUsesRs1AsSource(op: u16) bool {
+    return switch (op) {
+        regalloc.OP_BR,
+        regalloc.OP_BR_IF,
+        regalloc.OP_BR_IF_NOT,
+        regalloc.OP_BLOCK_END,
+        regalloc.OP_NOP,
+        regalloc.OP_DELETED,
+        regalloc.OP_CONST32,
+        regalloc.OP_CONST64,
+        regalloc.OP_RETURN,
+        regalloc.OP_RETURN_VOID,
+        => false,
+        else => true,
+    };
+}
+
+/// True iff this opcode reads rs2_field as a vreg. Conservative: any
+/// binop-shaped opcode might use it; ops that don't (unary, mov, load,
+/// stores) over-approximate harmlessly. The hard exclusions are ops
+/// where rs2_field is guaranteed unused so reading it would mark
+/// vreg 0 as last-used spuriously.
+fn opUsesRs2AsSource(op: u16) bool {
+    return switch (op) {
+        regalloc.OP_BR,
+        regalloc.OP_BR_IF,
+        regalloc.OP_BR_IF_NOT,
+        regalloc.OP_BLOCK_END,
+        regalloc.OP_NOP,
+        regalloc.OP_DELETED,
+        regalloc.OP_CONST32,
+        regalloc.OP_CONST64,
+        regalloc.OP_RETURN,
+        regalloc.OP_RETURN_VOID,
+        regalloc.OP_RETURN_MULTI,
+        regalloc.OP_MOV,
+        regalloc.OP_BR_TABLE,
+        => false,
+        else => true,
+    };
+}
+
+const testing = std.testing;
+
+test "LoopInfo: empty IR yields empty slices" {
+    var info: LoopInfo = .{};
+    defer info.deinit(testing.allocator);
+    try testing.expect(info.analyse(testing.allocator, &.{}, 0));
+    try testing.expectEqual(@as(usize, 0), info.branch_targets.len);
+    try testing.expectEqual(@as(usize, 0), info.loop_headers.len);
+    try testing.expectEqual(@as(usize, 0), info.loop_end.len);
+    try testing.expectEqual(@as(usize, 0), info.vreg_first_def.len);
+    try testing.expectEqual(@as(usize, 0), info.vreg_last_use.len);
+}
+
+test "LoopInfo: forward branch flagged, no loop header" {
+    const ir = [_]RegInstr{
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_BR, .rd = 0, .rs1 = 0, .operand = 3 },
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+    };
+    var info: LoopInfo = .{};
+    defer info.deinit(testing.allocator);
+    try testing.expect(info.analyse(testing.allocator, &ir, 0));
+    try testing.expect(info.branch_targets[3]);
+    try testing.expect(!info.loop_headers[3]);
+    try testing.expectEqual(@as(u32, 0), info.loop_end[3]);
+}
+
+test "LoopInfo: backward branch is a loop header with end_pc" {
+    const ir = [_]RegInstr{
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_BR_IF, .rd = 0, .rs1 = 0, .operand = 0 },
+    };
+    var info: LoopInfo = .{};
+    defer info.deinit(testing.allocator);
+    try testing.expect(info.analyse(testing.allocator, &ir, 0));
+    try testing.expect(info.branch_targets[0]);
+    try testing.expect(info.loop_headers[0]);
+    try testing.expectEqual(@as(u32, 2), info.loop_end[0]);
+}
+
+test "LoopInfo: nested back-edges keep max end_pc" {
+    const ir = [_]RegInstr{
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_BR, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_BR, .rd = 0, .rs1 = 0, .operand = 0 },
+    };
+    var info: LoopInfo = .{};
+    defer info.deinit(testing.allocator);
+    try testing.expect(info.analyse(testing.allocator, &ir, 0));
+    try testing.expectEqual(@as(u32, 4), info.loop_end[0]);
+}
+
+test "LoopInfo: BLOCK_END marks the END pc itself as a target" {
+    const ir = [_]RegInstr{
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_BLOCK_END, .rd = 0, .rs1 = 0, .operand = 0 },
+    };
+    var info: LoopInfo = .{};
+    defer info.deinit(testing.allocator);
+    try testing.expect(info.analyse(testing.allocator, &ir, 0));
+    try testing.expect(info.branch_targets[1]);
+    try testing.expect(!info.loop_headers[1]);
+}
+
+test "LoopInfo: liveness — CONST32 writes rd, ADD reads rs1+rs2_field" {
+    // pc=0: const32 r2 = 42
+    // pc=1: const32 r3 = 5
+    // pc=2: i32.add r4 = r2 + r3
+    // pc=3: return r4
+    const ir = [_]RegInstr{
+        .{ .op = regalloc.OP_CONST32, .rd = 2, .rs1 = 0, .operand = 42 },
+        .{ .op = regalloc.OP_CONST32, .rd = 3, .rs1 = 0, .operand = 5 },
+        .{ .op = 0x6A, .rd = 4, .rs1 = 2, .rs2_field = 3, .operand = 0 }, // i32.add
+        .{ .op = regalloc.OP_RETURN, .rd = 4, .rs1 = 0, .operand = 0 },
+    };
+    var info: LoopInfo = .{};
+    defer info.deinit(testing.allocator);
+    try testing.expect(info.analyse(testing.allocator, &ir, 5));
+    try testing.expectEqual(@as(u32, 5), info.vreg_count);
+    // r2 first defined at pc 0, last used at pc 2
+    try testing.expectEqual(@as(u32, 0), info.vreg_first_def[2]);
+    try testing.expectEqual(@as(u32, 2), info.vreg_last_use[2]);
+    // r3 first defined at pc 1, last used at pc 2
+    try testing.expectEqual(@as(u32, 1), info.vreg_first_def[3]);
+    try testing.expectEqual(@as(u32, 2), info.vreg_last_use[3]);
+    // r4 first defined at pc 2, last used at pc 3 (RETURN reads rd)
+    try testing.expectEqual(@as(u32, 2), info.vreg_first_def[4]);
+    try testing.expectEqual(@as(u32, 3), info.vreg_last_use[4]);
+    // r0 / r1 never touched
+    try testing.expectEqual(NEVER_DEFINED, info.vreg_first_def[0]);
+    try testing.expectEqual(@as(u32, 0), info.vreg_last_use[0]);
+    try testing.expectEqual(NEVER_DEFINED, info.vreg_first_def[1]);
+}
+
+test "LoopInfo: liveness — store reads rd, does not write rd" {
+    // pc=0: const32 r2 = 100   (address)
+    // pc=1: const32 r3 = 7     (value)
+    // pc=2: i32.store rd=r3, rs1=r2  (op 0x36)
+    const ir = [_]RegInstr{
+        .{ .op = regalloc.OP_CONST32, .rd = 2, .rs1 = 0, .operand = 100 },
+        .{ .op = regalloc.OP_CONST32, .rd = 3, .rs1 = 0, .operand = 7 },
+        .{ .op = 0x36, .rd = 3, .rs1 = 2, .operand = 0 }, // i32.store
+    };
+    var info: LoopInfo = .{};
+    defer info.deinit(testing.allocator);
+    try testing.expect(info.analyse(testing.allocator, &ir, 5));
+    // The store does not redefine r3; first_def stays at pc=1
+    try testing.expectEqual(@as(u32, 1), info.vreg_first_def[3]);
+    // Both r2 (address) and r3 (value) are read at pc=2
+    try testing.expectEqual(@as(u32, 2), info.vreg_last_use[2]);
+    try testing.expectEqual(@as(u32, 2), info.vreg_last_use[3]);
+}
+
+test "LoopInfo: liveness — BR_IF reads rd as condition" {
+    // pc=0: const32 r2 = 1   (condition)
+    // pc=1: br_if rd=r2 -> pc=3
+    // pc=2: nop
+    // pc=3: nop
+    const ir = [_]RegInstr{
+        .{ .op = regalloc.OP_CONST32, .rd = 2, .rs1 = 0, .operand = 1 },
+        .{ .op = regalloc.OP_BR_IF, .rd = 2, .rs1 = 0, .operand = 3 },
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+        .{ .op = regalloc.OP_NOP, .rd = 0, .rs1 = 0, .operand = 0 },
+    };
+    var info: LoopInfo = .{};
+    defer info.deinit(testing.allocator);
+    try testing.expect(info.analyse(testing.allocator, &ir, 4));
+    // r2 first defined at pc=0, BR_IF reads it (does NOT redefine) at pc=1
+    try testing.expectEqual(@as(u32, 0), info.vreg_first_def[2]);
+    try testing.expectEqual(@as(u32, 1), info.vreg_last_use[2]);
+}
+
+test "LoopInfo: liveness — MOV does not over-read rs2 default 0" {
+    // pc=0: const32 r5 = 99
+    // pc=1: mov r6 = r5  (rs1=5, rs2_field defaults to 0)
+    // We must NOT mark vreg 0 as last_used at pc=1 just because rs2
+    // defaulted to 0.
+    const ir = [_]RegInstr{
+        .{ .op = regalloc.OP_CONST32, .rd = 5, .rs1 = 0, .operand = 99 },
+        .{ .op = regalloc.OP_MOV, .rd = 6, .rs1 = 5, .rs2_field = 0, .operand = 0 },
+    };
+    var info: LoopInfo = .{};
+    defer info.deinit(testing.allocator);
+    try testing.expect(info.analyse(testing.allocator, &ir, 8));
+    try testing.expectEqual(@as(u32, 1), info.vreg_last_use[5]); // read at pc=1
+    try testing.expectEqual(@as(u32, 0), info.vreg_last_use[0]); // never read
+}
diff --git a/src/x86.zig b/src/x86.zig
index 34fe606cd..7ef84ae19 100644
--- a/src/x86.zig
+++ b/src/x86.zig
@@ -44,6 +44,8 @@ const JitCode = jit_mod.JitCode;
 const JitFn = jit_mod.JitFn;
 const vm_mod = @import("vm.zig");
 const platform = @import("platform.zig");
+const loop_info_mod = @import("loop_info.zig");
+const LoopInfo = loop_info_mod.LoopInfo;
 
 // ================================================================
 // x86_64 register definitions
@@ -1728,13 +1730,11 @@ pub const Compiler = struct {
     osr_target_pc: ?u32,
     /// Byte offset of the OSR prologue in the code buffer.
     osr_prologue_offset: u32,
-    /// IR slice and branch targets for peephole fusion (set during compile).
+    /// IR slice for peephole fusion (set during compile).
     ir_slice: []const RegInstr = &.{},
-    branch_targets_slice: []bool = &.{},
-    /// Loop header markers: true for PCs that are targets of backward branches.
-    loop_headers_slice: []bool = &.{},
-    /// For each loop header PC, the max back-edge source PC (defines loop body range).
-    loop_end_map: []u32 = &.{},
+    /// Branch / loop / liveness analysis owned by the compile, freed at end.
+    /// Populated by `LoopInfo.analyse` at the start of `compileMain`.
+    loop_info: LoopInfo = .{},
 
     const Patch = struct {
         rel32_offset: u32, // byte offset of the rel32 field in code
@@ -3845,13 +3845,11 @@ pub const Compiler = struct {
 
         self.pc_map.appendNTimes(self.alloc, 0, ir.len + 1) catch return null;
 
-        // Pre-scan: find branch targets for fusion safety
-        const branch_targets = self.scanBranchTargets(ir) orelse return null;
-        defer self.alloc.free(branch_targets);
-        defer self.alloc.free(self.loop_headers_slice);
-        defer self.alloc.free(self.loop_end_map);
+        // Pre-scan: branch targets, loop headers, loop body extents.
+        if (!self.loop_info.analyse(self.alloc, ir, self.reg_count)) return null;
+        defer self.loop_info.deinit(self.alloc);
+        const branch_targets = self.loop_info.branch_targets;
         self.ir_slice = ir;
-        self.branch_targets_slice = branch_targets;
 
         // Pre-scan: compute written_vregs for the ENTIRE function.
         // Must cover all instructions (not just those before each call site)
@@ -3879,7 +3877,7 @@ pub const Compiler = struct {
             // Evict SIMD XMM cache at branch targets (merge points)
             if (pc < branch_targets.len and branch_targets[pc]) {
                 self.scratch_vreg = null;
-                if (pc < self.loop_headers_slice.len and self.loop_headers_slice[pc]) {
+                if (pc < self.loop_info.loop_headers.len and self.loop_info.loop_headers[pc]) {
                     // Loop header: emit pre-loads BEFORE pc_map (first iteration only)
                     self.emitLoopPreHeader(ir, pc);
                 } else {
@@ -3984,7 +3982,7 @@ pub const Compiler = struct {
 
     /// Emit loop pre-header: load v128 input vregs into XMM regs before the loop header.
     fn emitLoopPreHeader(self: *Compiler, ir: []const RegInstr, header_pc: u32) void {
-        const end_pc = if (header_pc < self.loop_end_map.len) self.loop_end_map[header_pc] else return;
+        const end_pc = if (header_pc < self.loop_info.loop_end.len) self.loop_info.loop_end[header_pc] else return;
         if (end_pc == 0) return;
 
         var written = [_]bool{false} ** 128;
@@ -6009,7 +6007,7 @@ pub const Compiler = struct {
             regalloc_mod.OP_BR => {
                 const target = instr.operand;
                 const is_back_edge = target <= pc.* - 1;
-                if (is_back_edge and target < self.loop_headers_slice.len and self.loop_headers_slice[target]) {
+                if (is_back_edge and target < self.loop_info.loop_headers.len and self.loop_info.loop_headers[target]) {
                     // Back-edge to loop header: flush XMM (keep cache) for deopt safety
                     self.simdXregFlushAll();
                 } else if (is_back_edge) {
@@ -6474,73 +6472,10 @@ pub const Compiler = struct {
 
     // --- Peephole fusion: CMP+Jcc ---
 
-    fn scanBranchTargets(self: *Compiler, ir: []const RegInstr) ?[]bool {
-        const targets = self.alloc.alloc(bool, ir.len) catch return null;
-        @memset(targets, false);
-        const loop_headers = self.alloc.alloc(bool, ir.len) catch {
-            self.alloc.free(targets);
-            return null;
-        };
-        @memset(loop_headers, false);
-        const loop_end = self.alloc.alloc(u32, ir.len) catch {
-            self.alloc.free(loop_headers);
-            self.alloc.free(targets);
-            return null;
-        };
-        @memset(loop_end, 0);
-
-        var scan_pc: u32 = 0;
-        while (scan_pc < ir.len) {
-            const instr = ir[scan_pc];
-            const source_pc = scan_pc;
-            scan_pc += 1;
-            switch (instr.op) {
-                regalloc_mod.OP_BR => {
-                    if (instr.operand < ir.len) {
-                        targets[instr.operand] = true;
-                        if (instr.operand <= source_pc) {
-                            loop_headers[instr.operand] = true;
-                            if (source_pc > loop_end[instr.operand])
-                                loop_end[instr.operand] = source_pc;
-                        }
-                    }
-                },
-                regalloc_mod.OP_BR_IF, regalloc_mod.OP_BR_IF_NOT => {
-                    if (instr.operand < ir.len) {
-                        targets[instr.operand] = true;
-                        if (instr.operand <= source_pc) {
-                            loop_headers[instr.operand] = true;
-                            if (source_pc > loop_end[instr.operand])
-                                loop_end[instr.operand] = source_pc;
-                        }
-                    }
-                },
-                regalloc_mod.OP_BR_TABLE => {
-                    const count = instr.operand;
-                    var i: u32 = 0;
-                    while (i < count + 1 and scan_pc < ir.len) : (i += 1) {
-                        const entry = ir[scan_pc];
-                        scan_pc += 1;
-                        if (entry.operand < ir.len) {
-                            targets[entry.operand] = true;
-                            if (entry.operand <= source_pc) {
-                                loop_headers[entry.operand] = true;
-                                if (source_pc > loop_end[entry.operand])
-                                    loop_end[entry.operand] = source_pc;
-                            }
-                        }
-                    }
-                },
-                regalloc_mod.OP_BLOCK_END => {
-                    targets[scan_pc - 1] = true;
-                },
-                else => {},
-            }
-        }
-        self.loop_headers_slice = loop_headers;
-        self.loop_end_map = loop_end;
-        return targets;
-    }
+    // scanBranchTargets has moved to src/loop_info.zig — see jit.zig comment
+    // for the rationale. The body of the analysis is shared with the ARM64
+    // backend so future enrichments (liveness, invariant-const classification)
+    // do not need to be duplicated.
 
     /// Try to fuse a CMP result with a following BR_IF/BR_IF_NOT.
     /// Returns true if fused, false if not fuseable. Returns null on OOM.
@@ -6549,7 +6484,7 @@ pub const Compiler = struct {
         const next = self.ir_slice[pc.*];
         if (next.op != regalloc_mod.OP_BR_IF and next.op != regalloc_mod.OP_BR_IF_NOT) return false;
         if (next.rd != rd) return false;
-        if (pc.* < self.branch_targets_slice.len and self.branch_targets_slice[pc.*]) return false;
+        if (pc.* < self.loop_info.branch_targets.len and self.loop_info.branch_targets[pc.*]) return false;
 
         // Fuse: emit Jcc instead of SETCC + MOVZX + store + load + TEST + Jcc
         const actual_cc = if (next.op == regalloc_mod.OP_BR_IF) cc else cc.invert();