Add AMD ROCm/HIP Windows llama.cpp engine (auto-download on Strix Halo)#85
Open
rjckkkkk wants to merge 1 commit into
Open
Add AMD ROCm/HIP Windows llama.cpp engine (auto-download on Strix Halo)#85rjckkkkk wants to merge 1 commit into
rjckkkkk wants to merge 1 commit into
Conversation
…Halo On a no-NVIDIA AMD Windows box (Ryzen AI Max+ 395 / Radeon 8060S, RDNA3.5), the only catalog llama.cpp source for windows/amd64 was the CUDA build, which runs CPU-only there; the AMD engines (llamacpp-vulkan / llamacpp-rocm-rdna3) are linux-only. So out-of-the-box auto-download fetched the wrong (CUDA) build and users had to manually install a ROCm/HIP llama.cpp and point AIMA_ENGINE_DIR at it. Add a native engine asset `llamacpp-hip-windows` (gpu_arch RDNA3.5, windows/amd64, source = official llama.cpp `win-hip-radeon-x64.zip`, version b9330). The native engine resolver already skips engines whose source has no binary for the host platform, so on RDNA3.5+Windows this exact-arch, windows-supported asset is selected over the linux-only vulkan asset and the NVIDIA-only universal CUDA source — making `aima deploy <model>` auto-download the right ROCm/HIP build with no AIMA_ENGINE_DIR. Knowledge-only, no Go changes (INV-1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rjckkkkk
added a commit
that referenced
this pull request
Jun 10, 2026
#86) Refresh dist/aima-windows-amd64.exe (v0.5-dev-amd-strix-halo, commit fc3ef41) so the bundled binary now carries the out-of-box AMD-HIP llama.cpp work: - #85 llamacpp-hip-windows engine asset (go:embed'd into the exe): a no-NVIDIA Strix Halo box auto-downloads the official ROCm/HIP llama.cpp (b9330, win-hip-radeon-x64) instead of the CPU-only CUDA universal source. - #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed llama.cpp of ANY version is launchable -- the partner's llama.cpp version is supported whether or not it matches the bundled b9330. The catalog YAML ships compiled into the binary, verified on the 395 rig: `aima engine info llamacpp-hip-windows` returns the b9330 asset; `aima version` -> v0.5-dev-amd-strix-halo / fc3ef41; `aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM. README + serve.bat updated: AIMA_ENGINE_DIR is now optional. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rjckkkkk
added a commit
that referenced
this pull request
Jun 10, 2026
#86) Refresh dist/aima-windows-amd64.exe so the bundled binary carries the out-of-box AMD-HIP llama.cpp work. Version string is date-stamped to be distinguishable from the prior handoff build: aima version -> v0.5-dev-amd-strix-halo-20260610 (commit fc3ef41) (vs the earlier v0.5-dev-amd-strix-halo, which lacked #85/#86) - #85 llamacpp-hip-windows engine asset (go:embed'd into the exe): a no-NVIDIA Strix Halo box auto-downloads the official ROCm/HIP llama.cpp (b9330, win-hip-radeon-x64) instead of the CPU-only CUDA universal source. - #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed llama.cpp of ANY version is launchable -- the partner's llama.cpp version is supported whether or not it matches the bundled b9330. The catalog YAML ships compiled into the binary AND in source (catalog/engines/llamacpp-hip-windows.yaml). Verified on the 395 rig: `aima version` -> v0.5-dev-amd-strix-halo-20260610 / fc3ef41; `aima engine info llamacpp-hip-windows` -> b9330 asset; `aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM. README + serve.bat updated: AIMA_ENGINE_DIR is now optional. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rjckkkkk
added a commit
that referenced
this pull request
Jun 10, 2026
The dist exe filename now carries its version string so a new build does NOT
overwrite the previous one in place. Two builds coexist:
aima-windows-amd64-v0.5-dev-amd-strix-halo-20260610.exe -> v0.5-dev-amd-strix-halo-20260610
latest; adds the out-of-box AMD-HIP llama.cpp engine (#85, #86). serve.bat uses this.
aima-windows-amd64-v0.5-dev-amd-strix-halo.exe -> v0.5-dev-amd-strix-halo
restored 2026-06-09 build (#78-#83 only, no HIP auto-download). Kept for rollback.
Both from source commit fc3ef41; filename == the exe's own `aima version` string.
- #85 llamacpp-hip-windows engine asset (go:embed'd into the exe + in source at
catalog/engines/llamacpp-hip-windows.yaml, pins official b9330 win-hip-radeon-x64):
a no-NVIDIA Strix Halo box auto-downloads the right ROCm/HIP llama.cpp instead of
the CPU-only CUDA universal source.
- #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR
(dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed llama.cpp
of ANY version is launchable -- the partner's llama.cpp version is supported
whether or not it matches the bundled b9330.
Verified the latest build on the 395 rig: `aima version` ->
v0.5-dev-amd-strix-halo-20260610 / fc3ef41; `aima engine info llamacpp-hip-windows`
-> b9330 asset; `aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM.
README + serve.bat updated (build table; AIMA_ENGINE_DIR now optional).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On the AMD Strix Halo Windows box (Ryzen AI Max+ 395 / Radeon 8060S, RDNA3.5, no NVIDIA), AIMA could not auto-download a working llama.cpp:
llamacpp-universal'swindows/amd64source is the CUDA build → runs CPU-only on AMD.llamacpp-vulkan/llamacpp-rocm-rdna3are AMD engines but linux-only (no windows native source).So out-of-the-box auto-download fetched the wrong build, and the handoff build's
serve.battold users to manually download a ROCm/HIP llama.cpp and setAIMA_ENGINE_DIR.Fix
Add native engine asset
llamacpp-hip-windows:gpu_arch: RDNA3.5,platforms: [windows/amd64], source = official llama.cpp releasewin-hip-radeon-x64.zip(self-contained: bundlesggml-hip.dll+rocblas, no separate ROCm install),version: b9330.The native engine resolver already skips engines whose
sourcehas no binary for the host platform (resolver.go: "Skip native-only incompatibility"). So on RDNA3.5 + Windows this exact-arch, windows-supported asset is selected over the linux-only vulkan asset and the NVIDIA-only universal CUDA source. Knowledge-only — no Go changes (INV-1).Verified end-to-end on the 395 (Radeon 8060S, gfx1151)
With dist empty and no
AIMA_ENGINE_DIR:→ auto-downloaded
llama-b9330-bin-win-hip-radeon-x64.zip(1.2 GB, via mirror) → extracted to~/.aima/dist/→llama-serverstarted on the iGPU, ROCm0 detected 110 GB VRAM, all layers offloaded, 33.8 tok/s decode (warm). Same speed as the team's hand-installed b9180 build — the official HIP release works on gfx1151 with no env override or custom build.Depends on
The
gpu_arch=RDNA3.5match requires the Windows-AMD HAL detection (#78). On a plaindevelopbinary (no #78) Windows reports an emptygpu_archand this asset won't match; it lights up once the AMD-Windows HAL series (#78–#84) is in.🤖 Generated with Claude Code