Skip to content

Add AMD ROCm/HIP Windows llama.cpp engine (auto-download on Strix Halo)#85

Open
rjckkkkk wants to merge 1 commit into
developfrom
feat/amd-windows-hip-llamacpp-engine
Open

Add AMD ROCm/HIP Windows llama.cpp engine (auto-download on Strix Halo)#85
rjckkkkk wants to merge 1 commit into
developfrom
feat/amd-windows-hip-llamacpp-engine

Conversation

@rjckkkkk

@rjckkkkk rjckkkkk commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Problem

On the AMD Strix Halo Windows box (Ryzen AI Max+ 395 / Radeon 8060S, RDNA3.5, no NVIDIA), AIMA could not auto-download a working llama.cpp:

  • llamacpp-universal's windows/amd64 source is the CUDA build → runs CPU-only on AMD.
  • llamacpp-vulkan / llamacpp-rocm-rdna3 are AMD engines but linux-only (no windows native source).

So out-of-the-box auto-download fetched the wrong build, and the handoff build's serve.bat told users to manually download a ROCm/HIP llama.cpp and set AIMA_ENGINE_DIR.

Fix

Add native engine asset llamacpp-hip-windows: gpu_arch: RDNA3.5, platforms: [windows/amd64], source = official llama.cpp release win-hip-radeon-x64.zip (self-contained: bundles ggml-hip.dll + rocblas, no separate ROCm install), version: b9330.

The native engine resolver already skips engines whose source has no binary for the host platform (resolver.go: "Skip native-only incompatibility"). So on RDNA3.5 + Windows this exact-arch, windows-supported asset is selected over the linux-only vulkan asset and the NVIDIA-only universal CUDA source. Knowledge-only — no Go changes (INV-1).

Verified end-to-end on the 395 (Radeon 8060S, gfx1151)

With dist empty and no AIMA_ENGINE_DIR:

aima deploy Qwen3.5-9B-Q4_K_M --engine llamacpp

→ auto-downloaded llama-b9330-bin-win-hip-radeon-x64.zip (1.2 GB, via mirror) → extracted to ~/.aima/dist/llama-server started on the iGPU, ROCm0 detected 110 GB VRAM, all layers offloaded, 33.8 tok/s decode (warm). Same speed as the team's hand-installed b9180 build — the official HIP release works on gfx1151 with no env override or custom build.

Depends on

The gpu_arch=RDNA3.5 match requires the Windows-AMD HAL detection (#78). On a plain develop binary (no #78) Windows reports an empty gpu_arch and this asset won't match; it lights up once the AMD-Windows HAL series (#78#84) is in.

🤖 Generated with Claude Code

…Halo

On a no-NVIDIA AMD Windows box (Ryzen AI Max+ 395 / Radeon 8060S, RDNA3.5),
the only catalog llama.cpp source for windows/amd64 was the CUDA build, which
runs CPU-only there; the AMD engines (llamacpp-vulkan / llamacpp-rocm-rdna3)
are linux-only. So out-of-the-box auto-download fetched the wrong (CUDA) build
and users had to manually install a ROCm/HIP llama.cpp and point AIMA_ENGINE_DIR
at it.

Add a native engine asset `llamacpp-hip-windows` (gpu_arch RDNA3.5, windows/amd64,
source = official llama.cpp `win-hip-radeon-x64.zip`, version b9330). The native
engine resolver already skips engines whose source has no binary for the host
platform, so on RDNA3.5+Windows this exact-arch, windows-supported asset is
selected over the linux-only vulkan asset and the NVIDIA-only universal CUDA
source — making `aima deploy <model>` auto-download the right ROCm/HIP build
with no AIMA_ENGINE_DIR. Knowledge-only, no Go changes (INV-1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rjckkkkk added a commit that referenced this pull request Jun 10, 2026
#86)

Refresh dist/aima-windows-amd64.exe (v0.5-dev-amd-strix-halo, commit fc3ef41)
so the bundled binary now carries the out-of-box AMD-HIP llama.cpp work:

- #85 llamacpp-hip-windows engine asset (go:embed'd into the exe): a no-NVIDIA
  Strix Halo box auto-downloads the official ROCm/HIP llama.cpp (b9330,
  win-hip-radeon-x64) instead of the CPU-only CUDA universal source.
- #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR
  (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed
  llama.cpp of ANY version is launchable -- the partner's llama.cpp version
  is supported whether or not it matches the bundled b9330.

The catalog YAML ships compiled into the binary, verified on the 395 rig:
`aima engine info llamacpp-hip-windows` returns the b9330 asset; `aima version`
-> v0.5-dev-amd-strix-halo / fc3ef41; `aima hal detect` -> RDNA3.5 gfx1151,
~110 GB VRAM. README + serve.bat updated: AIMA_ENGINE_DIR is now optional.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rjckkkkk added a commit that referenced this pull request Jun 10, 2026
#86)

Refresh dist/aima-windows-amd64.exe so the bundled binary carries the out-of-box
AMD-HIP llama.cpp work. Version string is date-stamped to be distinguishable from
the prior handoff build:

  aima version -> v0.5-dev-amd-strix-halo-20260610  (commit fc3ef41)
  (vs the earlier v0.5-dev-amd-strix-halo, which lacked #85/#86)

- #85 llamacpp-hip-windows engine asset (go:embed'd into the exe): a no-NVIDIA
  Strix Halo box auto-downloads the official ROCm/HIP llama.cpp (b9330,
  win-hip-radeon-x64) instead of the CPU-only CUDA universal source.
- #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR
  (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed
  llama.cpp of ANY version is launchable -- the partner's llama.cpp version
  is supported whether or not it matches the bundled b9330.

The catalog YAML ships compiled into the binary AND in source
(catalog/engines/llamacpp-hip-windows.yaml). Verified on the 395 rig:
`aima version` -> v0.5-dev-amd-strix-halo-20260610 / fc3ef41;
`aima engine info llamacpp-hip-windows` -> b9330 asset;
`aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM.
README + serve.bat updated: AIMA_ENGINE_DIR is now optional.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rjckkkkk added a commit that referenced this pull request Jun 10, 2026
The dist exe filename now carries its version string so a new build does NOT
overwrite the previous one in place. Two builds coexist:

  aima-windows-amd64-v0.5-dev-amd-strix-halo-20260610.exe  -> v0.5-dev-amd-strix-halo-20260610
      latest; adds the out-of-box AMD-HIP llama.cpp engine (#85, #86). serve.bat uses this.
  aima-windows-amd64-v0.5-dev-amd-strix-halo.exe           -> v0.5-dev-amd-strix-halo
      restored 2026-06-09 build (#78-#83 only, no HIP auto-download). Kept for rollback.

Both from source commit fc3ef41; filename == the exe's own `aima version` string.

- #85 llamacpp-hip-windows engine asset (go:embed'd into the exe + in source at
  catalog/engines/llamacpp-hip-windows.yaml, pins official b9330 win-hip-radeon-x64):
  a no-NVIDIA Strix Halo box auto-downloads the right ROCm/HIP llama.cpp instead of
  the CPU-only CUDA universal source.
- #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR
  (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed llama.cpp
  of ANY version is launchable -- the partner's llama.cpp version is supported
  whether or not it matches the bundled b9330.

Verified the latest build on the 395 rig: `aima version` ->
v0.5-dev-amd-strix-halo-20260610 / fc3ef41; `aima engine info llamacpp-hip-windows`
-> b9330 asset; `aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM.
README + serve.bat updated (build table; AIMA_ENGINE_DIR now optional).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant