Skip to content

Launch native engines from AIMA_ENGINE_DIR (scanned ⇒ launchable)#86

Open
rjckkkkk wants to merge 1 commit into
developfrom
feat/native-engine-dir-resolution
Open

Launch native engines from AIMA_ENGINE_DIR (scanned ⇒ launchable)#86
rjckkkkk wants to merge 1 commit into
developfrom
feat/native-engine-dir-resolution

Conversation

@rjckkkkk

Copy link
Copy Markdown
Collaborator

Problem

The engine scanner discovers native engine binaries in AIMA_ENGINE_DIR (PR #80), but native deploy resolved the binary only via: dist → BinarySource probe/download → PATH. The probe path is supposed to carry the scanned binary's absolute path into the deploy, but on AMD-Windows the overlay injects it into the hardware-preferred engine asset (llamacpp-vulkan, linux-only) while native deploy resolves to a different asset — so the probe never reaches the launch command.

Result: a pre-installed llama.cpp registered via AIMA_ENGINE_DIR scans fine but won't deploy — the launch command falls back to the bare name llama-server, and Windows errors:

'llama-server' 不是内部或外部命令…   ('llama-server' is not recognized)

Fix

Resolve the native binary against AIMA_ENGINE_DIR as well — the same dirs the engine scanner reads — so scanned ⇒ launchable holds regardless of catalog/overlay engine selection.

  • internal/runtime/native.go: WithEngineDirs option + findInEngineDirs; resolution order is now dist → AIMA_ENGINE_DIR → auto-download → PATH.
  • cmd/aima/infra.go: populate engine dirs from AIMA_ENGINE_DIR (mirrors how the scanner reads it).
  • No-op when AIMA_ENGINE_DIR is unset → other devices/runtimes unaffected. Unit test TestFindInEngineDirsResolvesScannedBinary.

Verified on the 395 (Radeon 8060S, RDNA3.5, Windows)

Reproducing the partner's exact setup — AIMA_ENGINE_DIR pointing at a pre-installed llama.cpp, empty dist:

aima deploy Qwen3.5-9B-Q4_K_M --engine llamacpp

→ running llama-server path = D:\tools\llama-b9180-win-hip-radeon-x64\llama-server.exe (the absolute AIMA_ENGINE_DIR binary, not a bare name / not a download), health 200, fingerprint b9180, GPU offloaded. Before the fix this failed with the "not recognized" error above.

🤖 Generated with Claude Code

The engine scanner discovers native binaries in AIMA_ENGINE_DIR, but native
deploy resolved the binary only via dist → BinarySource probe/download → PATH.
When a pre-installed engine was registered via AIMA_ENGINE_DIR but not in dist
or PATH, and the catalog probe-path injection didn't reach the resolved engine
asset (e.g. AMD-Windows, where gpu_arch matching picks a linux-only asset), the
launch command fell back to the bare name ("llama-server") and Windows reported
"'llama-server' is not recognized" — so a model that scanned fine could not be
deployed.

Resolve the native binary against AIMA_ENGINE_DIR too — the SAME dirs the engine
scanner reads — so "scanned ⇒ launchable" holds regardless of catalog/overlay
engine selection. Order: dist → AIMA_ENGINE_DIR → auto-download → PATH. No-op when
AIMA_ENGINE_DIR is unset, so other devices/runtimes are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rjckkkkk added a commit that referenced this pull request Jun 10, 2026
#86)

Refresh dist/aima-windows-amd64.exe (v0.5-dev-amd-strix-halo, commit fc3ef41)
so the bundled binary now carries the out-of-box AMD-HIP llama.cpp work:

- #85 llamacpp-hip-windows engine asset (go:embed'd into the exe): a no-NVIDIA
  Strix Halo box auto-downloads the official ROCm/HIP llama.cpp (b9330,
  win-hip-radeon-x64) instead of the CPU-only CUDA universal source.
- #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR
  (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed
  llama.cpp of ANY version is launchable -- the partner's llama.cpp version
  is supported whether or not it matches the bundled b9330.

The catalog YAML ships compiled into the binary, verified on the 395 rig:
`aima engine info llamacpp-hip-windows` returns the b9330 asset; `aima version`
-> v0.5-dev-amd-strix-halo / fc3ef41; `aima hal detect` -> RDNA3.5 gfx1151,
~110 GB VRAM. README + serve.bat updated: AIMA_ENGINE_DIR is now optional.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rjckkkkk added a commit that referenced this pull request Jun 10, 2026
#86)

Refresh dist/aima-windows-amd64.exe so the bundled binary carries the out-of-box
AMD-HIP llama.cpp work. Version string is date-stamped to be distinguishable from
the prior handoff build:

  aima version -> v0.5-dev-amd-strix-halo-20260610  (commit fc3ef41)
  (vs the earlier v0.5-dev-amd-strix-halo, which lacked #85/#86)

- #85 llamacpp-hip-windows engine asset (go:embed'd into the exe): a no-NVIDIA
  Strix Halo box auto-downloads the official ROCm/HIP llama.cpp (b9330,
  win-hip-radeon-x64) instead of the CPU-only CUDA universal source.
- #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR
  (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed
  llama.cpp of ANY version is launchable -- the partner's llama.cpp version
  is supported whether or not it matches the bundled b9330.

The catalog YAML ships compiled into the binary AND in source
(catalog/engines/llamacpp-hip-windows.yaml). Verified on the 395 rig:
`aima version` -> v0.5-dev-amd-strix-halo-20260610 / fc3ef41;
`aima engine info llamacpp-hip-windows` -> b9330 asset;
`aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM.
README + serve.bat updated: AIMA_ENGINE_DIR is now optional.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rjckkkkk added a commit that referenced this pull request Jun 10, 2026
The dist exe filename now carries its version string so a new build does NOT
overwrite the previous one in place. Two builds coexist:

  aima-windows-amd64-v0.5-dev-amd-strix-halo-20260610.exe  -> v0.5-dev-amd-strix-halo-20260610
      latest; adds the out-of-box AMD-HIP llama.cpp engine (#85, #86). serve.bat uses this.
  aima-windows-amd64-v0.5-dev-amd-strix-halo.exe           -> v0.5-dev-amd-strix-halo
      restored 2026-06-09 build (#78-#83 only, no HIP auto-download). Kept for rollback.

Both from source commit fc3ef41; filename == the exe's own `aima version` string.

- #85 llamacpp-hip-windows engine asset (go:embed'd into the exe + in source at
  catalog/engines/llamacpp-hip-windows.yaml, pins official b9330 win-hip-radeon-x64):
  a no-NVIDIA Strix Halo box auto-downloads the right ROCm/HIP llama.cpp instead of
  the CPU-only CUDA universal source.
- #86 native runtime resolves the engine binary against AIMA_ENGINE_DIR
  (dist -> AIMA_ENGINE_DIR -> auto-download -> PATH), so a pre-installed llama.cpp
  of ANY version is launchable -- the partner's llama.cpp version is supported
  whether or not it matches the bundled b9330.

Verified the latest build on the 395 rig: `aima version` ->
v0.5-dev-amd-strix-halo-20260610 / fc3ef41; `aima engine info llamacpp-hip-windows`
-> b9330 asset; `aima hal detect` -> RDNA3.5 gfx1151, ~110 GB VRAM.
README + serve.bat updated (build table; AIMA_ENGINE_DIR now optional).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant