Skip to content

Draft model on the different gpu [GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS #70

@savvadesogle

Description

@savvadesogle

Hello,

If I load the model and draft onto the same GPU (for example GPU.0) - then the problem does not arise. If I load the model onto the GPU.0, and draft on GPU.1 - then an error appears.

Linux xpu 6.19.3-061903-generic #202602191659 SMP PREEMPT_DYNAMIC Sat Feb 21 08:17:10 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux

Config

    "Qwen3-14B-int4-ov-spec": {
      "model_name": "Qwen3-14B-int4-ov-spec",
      "model_path": "/mnt/data2/models/OpenVINO/Qwen3-14B-int4-ov",
      "device": "GPU.1",
      "model_type": "llm",
      "engine": "ovgenai",
      "draft_model_path": "/mnt/data2/models/OpenVINO/Qwen3-0.6B-int4-ov",
      "draft_device": "GPU.2",
      "num_assistant_tokens": 7,
      "runtime_config": {
        "PERFORMANCE_HINT": "LATENCY"
      }
    },

OpenARC server log

2026-02-22 12:41:58,202 - ERROR - [DEBUG] draft_model_loaded: True
2026-02-22 12:41:58,203 - ERROR - [DEBUG] self.model_num_assistant_tokens: 3
2026-02-22 12:41:58,203 - ERROR - [DEBUG] generation_kwargs.num_assistant_tokens: 3
2026-02-22 12:41:58,203 - ERROR - [DEBUG] generation_kwargs.assistant_confidence_threshold: 0.0
2026-02-22 12:42:17,029 - INFO - [LLM Worker: Qwen3-14B-int4-ov-spec] Metrics: {'load_time (s)': 28.29, 'ttft (s)': 0.37, 'tpot (ms)': 54.28816, 'prefill_throughput (tokens/s)': 2000.81, 'decode_throughput (tokens/s)': 18.42022, 'decode_duration (s)': 18.82504, 'input_token': 731, 'new_token': 341, 'total_token': 1072, 'stream': True, 'stream_chunk_tokens': 1}
2026-02-22 12:42:17,758 - INFO - Request received: POST /v1/chat/completions from 127.0.0.1
2026-02-22 12:42:17,765 - INFO - "Qwen3-8B-int4-ov" request received
2026-02-22 12:42:17,766 - INFO - Request completed: POST /v1/chat/completions status=400 duration=0.007s
2026-02-22 12:42:33,721 - INFO - Request received: POST /openarc/unload from 127.0.0.1
2026-02-22 12:42:34,434 - INFO - [Qwen3-14B-int4-ov-spec] unloaded successfully
2026-02-22 12:42:34,435 - INFO - Request completed: POST /openarc/unload status=200 duration=0.714s
2026-02-22 12:42:41,835 - INFO - Request received: POST /openarc/load from 127.0.0.1
2026-02-22 12:42:41,837 - INFO - Qwen3-14B-int4-ov-spec loading...
2026-02-22 12:42:41,837 - INFO - ModelType.LLM on GPU.1 with {}
2026-02-22 12:42:42,245 - INFO - Loaded draft model from /mnt/data2/models/OpenVINO/Qwen3-0.6B-int4-ov on GPU.2
2026-02-22 12:43:09,562 - ERROR - Model loading failed for Qwen3-14B-int4-ov-spec
Traceback (most recent call last):
  File "/home/arc/OpenArc/src/server/model_registry.py", line 145, in _load_task
    model_instance = await create_model_instance(load_config)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arc/OpenArc/src/server/model_registry.py", line 254, in create_model_instance
    await asyncio.to_thread(model_instance.load_model, load_config)
  File "/usr/local/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arc/OpenArc/src/engine/ov_genai/llm.py", line 306, in load_model
    self.model = LLMPipeline(
                 ^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:110:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS




2026-02-22 12:43:09,669 - INFO - Request completed: POST /openarc/load status=500 duration=27.834s

UV PIP LIST

(openarc) (openarc) arc@xpu:~/OpenArc$ uv pip list
Package                    Version                Editable project location
-------------------------- ---------------------- -------------------------
about-time                 4.2.1
addict                     2.4.0
aiohappyeyeballs           2.6.1
aiohttp                    3.12.14
aiosignal                  1.4.0
alive-progress             3.2.0
annotated-types            0.7.0
anyio                      4.9.0
asttokens                  3.0.0
attrs                      25.3.0
audioread                  3.0.1
autograd                   1.8.0
babel                      2.17.0
blis                       1.3.0
brotli                     1.1.0
catalogue                  2.0.10
certifi                    2025.7.14
cffi                       2.0.0
charset-normalizer         3.4.2
click                      8.2.1
cloudpathlib               0.22.0
cma                        4.2.0
colorama                   0.4.6
comm                       0.2.3
confection                 0.1.5
contourpy                  1.3.2
cryptography               46.0.3
csvw                       3.6.0
curated-tokenizers         0.0.9
curated-transformers       0.1.1
cycler                     0.12.1
cymem                      2.0.11
datasets                   4.0.0
ddgs                       9.6.1
debugpy                    1.8.17
decorator                  5.2.1
deprecated                 1.2.18
dill                       0.3.8
distro                     1.9.0
dlinfo                     2.0.0
docopt                     0.6.2
espeakng-loader            0.2.4
evdev                      1.9.2
executing                  2.2.1
fastapi                    0.116.1
filelock                   3.18.0
fonttools                  4.58.5
frozenlist                 1.7.0
fsspec                     2025.3.0
grapheme                   0.6.0
griffe                     1.14.0
h11                        0.16.0
h2                         4.3.0
hf-xet                     1.1.5
hpack                      4.1.0
httpcore                   1.0.9
httpx                      0.28.1
httpx-sse                  0.4.3
huggingface-hub            0.33.4
hyperframe                 6.1.0
idna                       3.10
iniconfig                  2.3.0
inquirerpy                 0.3.4
ipykernel                  7.0.1
ipython                    9.6.0
ipython-pygments-lexers    1.1.1
ipywidgets                 8.1.7
isodate                    0.7.2
jedi                       0.19.2
jinja2                     3.1.6
jiter                      0.11.0
joblib                     1.5.1
jsonschema                 4.24.0
jsonschema-specifications  2025.4.1
jupyter-client             8.6.3
jupyter-core               5.9.1
jupyterlab-widgets         3.0.15
kiwisolver                 1.4.8
kokoro                     0.9.4
langcodes                  3.5.0
language-data              1.3.0
language-tags              1.2.0
lazy-loader                0.4
librosa                    0.11.0
llvmlite                   0.45.0
loguru                     0.7.3
lxml                       6.0.2
marisa-trie                1.3.1
markdown-it-py             3.0.0
markupsafe                 3.0.2
matplotlib                 3.10.3
matplotlib-inline          0.1.7
mcp                        1.20.0
mdurl                      0.1.2
misaki                     0.9.4
mpmath                     1.3.0
msgpack                    1.1.1
multidict                  6.6.3
multiprocess               0.70.16
murmurhash                 1.0.13
natsort                    8.4.0
nest-asyncio               1.6.0
networkx                   3.4.2
ninja                      1.11.1.4
nncf                       2.17.0
num2words                  0.5.14
numba                      0.62.0
numpy                      2.2.6
onnx                       1.18.0
openai                     2.2.0
openai-agents              0.4.2
openarc                    2.0                    /home/arc/OpenArc
openvino                   2026.1.0.dev20260221
openvino-genai             2026.1.0.0.dev20260221
openvino-telemetry         2025.2.0
openvino-tokenizers        2026.1.0.0.dev20260221
optimum                    1.27.0
optimum-intel              1.25.2
packaging                  25.0
pandas                     2.2.3
parso                      0.8.5
pexpect                    4.9.0
pfzy                       0.3.4
phonemizer-fork            3.3.2
pillow                     11.3.0
pip                        25.2
platformdirs               4.4.0
pluggy                     1.6.0
pooch                      1.8.2
preshed                    3.0.10
primp                      0.15.0
prompt-toolkit             3.0.52
propcache                  0.3.2
protobuf                   6.31.1
psutil                     7.0.0
ptyprocess                 0.7.0
pure-eval                  0.2.3
pyarrow                    20.0.0
pycparser                  2.23
pydantic                   2.11.7
pydantic-core              2.33.2
pydantic-settings          2.11.0
pydot                      3.0.4
pygments                   2.19.2
pyjwt                      2.10.1
pymoo                      0.6.1.5
pynput                     1.8.1
pyparsing                  3.2.3
pytest                     8.4.2
python-dateutil            2.9.0.post0
python-dotenv              1.2.1
python-multipart           0.0.20
python-xlib                0.33
pytz                       2025.2
pyyaml                     6.0.2
pyzmq                      27.1.0
rdflib                     7.2.1
referencing                0.36.2
regex                      2024.11.6
requests                   2.32.4
rfc3986                    1.5.0
rich                       14.0.0
rich-click                 1.8.9
rpds-py                    0.26.0
safetensors                0.5.3
scikit-learn               1.7.0
scipy                      1.16.0
segments                   2.3.0
setuptools                 80.9.0
shellingham                1.5.4
six                        1.17.0
smart-open                 7.3.1
smolagents                 1.22.0
sniffio                    1.3.1
socksio                    1.0.0
sounddevice                0.5.2
soundfile                  0.13.1
soxr                       1.0.0
spacy                      3.8.7
spacy-curated-transformers 0.3.1
spacy-legacy               3.0.12
spacy-loggers              1.0.5
srsly                      2.5.1
sse-starlette              3.0.3
stack-data                 0.6.3
starlette                  0.47.1
sympy                      1.14.0
tabulate                   0.9.0
termcolor                  3.1.0
thinc                      8.3.6
threadpoolctl              3.6.0
tokenizers                 0.21.2
torch                      2.8.0+cpu
torchvision                0.23.0+cpu
tornado                    6.5.2
tqdm                       4.67.1
traitlets                  5.14.3
transformers               4.52.4
typer                      0.19.2
types-requests             2.32.4.20250913
typing-extensions          4.14.1
typing-inspection          0.4.1
tzdata                     2025.2
uritemplate                4.2.0
urllib3                    2.5.0
uvicorn                    0.35.0
wasabi                     1.1.3
wcwidth                    0.2.14
weasel                     0.4.1
widgetsnbextension         4.0.14
wrapt                      1.17.2
xxhash                     3.5.0
yarl                       1.20.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions