-
Notifications
You must be signed in to change notification settings - Fork 82
Open
Description
$ docker model reinstall-runner --backend vllm --gpu cuda
latest-vllm-cuda: Pulling from docker/model-runner
Digest: sha256:ca46bfa7f73c121e12e2ecbc81cbf37b1a4e4d68927f9878e2b414671d504f3f
Status: Image is up to date for docker/model-runner:latest-vllm-cuda
Successfully pulled docker/model-runner:latest-vllm-cuda
Starting model runner container docker-model-runner...
standalone model runner took too long to initialize
$ docker model install-runner --backend llama.cpp --gpu cuda
Model Runner container docker-model-runner (3a2af1900ac1) is already running
$ docker model status
Docker Model Runner is running
Status:
vllm: running vllm version: 0.12.0
llama.cpp: running llama.cpp version: unknown
mlx: not installed
$ docker model run huggingface.co/bartowski/allura-forge_llama-3.3-8b-instruct-gguf:q6_k_l "Introduce yourself"
Failed to generate a response: error response: status=500 body=unable to load runner: error waiting for runner to be ready: llama.cpp terminated unexpectedly: llama.cpp exit status: exit status 127
with output: /app/bin/com.docker.llama-server: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory
Same error if reinstalling llama.cpp backend first then installing vllm backend.
I would expect I can install the two backends, so that gguf model goes to llama.cpp, and safetensors model goes to vllm, as claimed in
Docker Model Runner intelligently routes your request: if you pull a GGUF model, it utilizes llama.cpp; if you pull a safetensors model, it leverages the power of vLLM. With Docker Model Runner, both can be pushed and pulled as OCI images to any OCI registry.
Environment:
OS: Windows 11
Docker Desktop: Version 4.55.0 (213807)
Metadata
Metadata
Assignees
Labels
No labels