Add AMD ROCm/HIP Windows llama.cpp engine (auto-download on Strix Halo) by rjckkkkk · Pull Request #85 · Approaching-AI/AIMA

rjckkkkk · 2026-06-09T15:13:42Z

Problem

On the AMD Strix Halo Windows box (Ryzen AI Max+ 395 / Radeon 8060S, RDNA3.5, no NVIDIA), AIMA could not auto-download a working llama.cpp:

llamacpp-universal's windows/amd64 source is the CUDA build → runs CPU-only on AMD.
llamacpp-vulkan / llamacpp-rocm-rdna3 are AMD engines but linux-only (no windows native source).

So out-of-the-box auto-download fetched the wrong build, and the handoff build's serve.bat told users to manually download a ROCm/HIP llama.cpp and set AIMA_ENGINE_DIR.

Fix

Add native engine asset llamacpp-hip-windows: gpu_arch: RDNA3.5, platforms: [windows/amd64], source = official llama.cpp release win-hip-radeon-x64.zip (self-contained: bundles ggml-hip.dll + rocblas, no separate ROCm install), version: b9330.

The native engine resolver already skips engines whose source has no binary for the host platform (resolver.go: "Skip native-only incompatibility"). So on RDNA3.5 + Windows this exact-arch, windows-supported asset is selected over the linux-only vulkan asset and the NVIDIA-only universal CUDA source. Knowledge-only — no Go changes (INV-1).

Verified end-to-end on the 395 (Radeon 8060S, gfx1151)

With dist empty and no AIMA_ENGINE_DIR:

aima deploy Qwen3.5-9B-Q4_K_M --engine llamacpp

→ auto-downloaded llama-b9330-bin-win-hip-radeon-x64.zip (1.2 GB, via mirror) → extracted to ~/.aima/dist/ → llama-server started on the iGPU, ROCm0 detected 110 GB VRAM, all layers offloaded, 33.8 tok/s decode (warm). Same speed as the team's hand-installed b9180 build — the official HIP release works on gfx1151 with no env override or custom build.

Depends on

The gpu_arch=RDNA3.5 match requires the Windows-AMD HAL detection (#78). On a plain develop binary (no #78) Windows reports an empty gpu_arch and this asset won't match; it lights up once the AMD-Windows HAL series (#78–#84) is in.

🤖 Generated with Claude Code

…Halo On a no-NVIDIA AMD Windows box (Ryzen AI Max+ 395 / Radeon 8060S, RDNA3.5), the only catalog llama.cpp source for windows/amd64 was the CUDA build, which runs CPU-only there; the AMD engines (llamacpp-vulkan / llamacpp-rocm-rdna3) are linux-only. So out-of-the-box auto-download fetched the wrong (CUDA) build and users had to manually install a ROCm/HIP llama.cpp and point AIMA_ENGINE_DIR at it. Add a native engine asset `llamacpp-hip-windows` (gpu_arch RDNA3.5, windows/amd64, source = official llama.cpp `win-hip-radeon-x64.zip`, version b9330). The native engine resolver already skips engines whose source has no binary for the host platform, so on RDNA3.5+Windows this exact-arch, windows-supported asset is selected over the linux-only vulkan asset and the NVIDIA-only universal CUDA source — making `aima deploy <model>` auto-download the right ROCm/HIP build with no AIMA_ENGINE_DIR. Knowledge-only, no Go changes (INV-1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

#86) Refresh dist/aima-windows-amd64.exe (v0.5-dev-amd-strix-halo, commit fc3ef41) so the bundled binary now carries the out-of-box AMD-HIP llama.cpp work: - #85 llamacpp-hip-windows engine asset (go:embed'd into the exe): a no-NVIDIA Strix Halo box auto-downloads the official ROCm/HIP llama.cpp (b9330, win-hip-radeon-x64) instead of the CPU-only CUDA universal source. - #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed llama.cpp of ANY version is launchable -- the partner's llama.cpp version is supported whether or not it matches the bundled b9330. The catalog YAML ships compiled into the binary, verified on the 395 rig: `aima engine info llamacpp-hip-windows` returns the b9330 asset; `aima version` -> v0.5-dev-amd-strix-halo / fc3ef41; `aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM. README + serve.bat updated: AIMA_ENGINE_DIR is now optional. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

#86) Refresh dist/aima-windows-amd64.exe so the bundled binary carries the out-of-box AMD-HIP llama.cpp work. Version string is date-stamped to be distinguishable from the prior handoff build: aima version -> v0.5-dev-amd-strix-halo-20260610 (commit fc3ef41) (vs the earlier v0.5-dev-amd-strix-halo, which lacked #85/#86) - #85 llamacpp-hip-windows engine asset (go:embed'd into the exe): a no-NVIDIA Strix Halo box auto-downloads the official ROCm/HIP llama.cpp (b9330, win-hip-radeon-x64) instead of the CPU-only CUDA universal source. - #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed llama.cpp of ANY version is launchable -- the partner's llama.cpp version is supported whether or not it matches the bundled b9330. The catalog YAML ships compiled into the binary AND in source (catalog/engines/llamacpp-hip-windows.yaml). Verified on the 395 rig: `aima version` -> v0.5-dev-amd-strix-halo-20260610 / fc3ef41; `aima engine info llamacpp-hip-windows` -> b9330 asset; `aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM. README + serve.bat updated: AIMA_ENGINE_DIR is now optional. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The dist exe filename now carries its version string so a new build does NOT overwrite the previous one in place. Two builds coexist: aima-windows-amd64-v0.5-dev-amd-strix-halo-20260610.exe -> v0.5-dev-amd-strix-halo-20260610 latest; adds the out-of-box AMD-HIP llama.cpp engine (#85, #86). serve.bat uses this. aima-windows-amd64-v0.5-dev-amd-strix-halo.exe -> v0.5-dev-amd-strix-halo restored 2026-06-09 build (#78-#83 only, no HIP auto-download). Kept for rollback. Both from source commit fc3ef41; filename == the exe's own `aima version` string. - #85 llamacpp-hip-windows engine asset (go:embed'd into the exe + in source at catalog/engines/llamacpp-hip-windows.yaml, pins official b9330 win-hip-radeon-x64): a no-NVIDIA Strix Halo box auto-downloads the right ROCm/HIP llama.cpp instead of the CPU-only CUDA universal source. - #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed llama.cpp of ANY version is launchable -- the partner's llama.cpp version is supported whether or not it matches the bundled b9330. Verified the latest build on the 395 rig: `aima version` -> v0.5-dev-amd-strix-halo-20260610 / fc3ef41; `aima engine info llamacpp-hip-windows` -> b9330 asset; `aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM. README + serve.bat updated (build table; AIMA_ENGINE_DIR now optional). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMD ROCm/HIP Windows llama.cpp engine (auto-download on Strix Halo)#85

Add AMD ROCm/HIP Windows llama.cpp engine (auto-download on Strix Halo)#85
rjckkkkk wants to merge 1 commit into
developfrom
feat/amd-windows-hip-llamacpp-engine

rjckkkkk commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rjckkkkk commented Jun 9, 2026

Problem

Fix

Verified end-to-end on the 395 (Radeon 8060S, gfx1151)

Depends on

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant