Skip to content

create_ray_wrapped_inference_engines drops the engine-core child's root cause on init failure #1673

@jamesbraza

Description

@jamesbraza

When SkyRL's create_ray_wrapped_inference_engines builds an AsyncVLLMInferenceEngine Ray actor whose vLLM v1 engine-core child process dies during init, the driver-side ActorDiedError that surfaces from ray.get(sleep_refs) bottoms out at vLLM's wait_for_engine_startup with:

File "vllm/v1/engine/utils.py", line 1178, in wait_for_engine_startup
    raise RuntimeError(
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

This RuntimeError is tough to debug. Ultimately I had to learn the underlying engine-core stderr/traceback is written only to /tmp/ray/session_*/logs/worker-*.err.

Reproduction

Run the below reproducer with Python 3.12, skyrl==0.2.0, ray==2.51.1, vllm==0.20.2 with at least one GPU. The reproducer's failure trigger is gpu_memory_utilization=0.999, which forces vLLM's engine-core child to raise ValueError: Free memory on device cuda:0 ... is less than desired GPU memory utilization inside its request_memory() call.

import argparse
import glob
import os
import sys
import time
import traceback
from pathlib import Path

import ray
from ray.exceptions import ActorDiedError

from skyrl.backends.skyrl_train.inference_engines.ray_wrapped_inference_engine import (
    create_ray_wrapped_inference_engines,
)


def find_recent_actor_err_logs(window_s: float = 180.0) -> list[Path]:
    cutoff = time.time() - window_s
    paths = [
        Path(p)
        for p in glob.glob("/tmp/ray/session_latest/logs/worker-*.err")
        if os.path.getmtime(p) >= cutoff and os.path.getsize(p) > 0
    ]
    return sorted(paths, key=os.path.getmtime, reverse=True)


def tail(path: Path, n: int = 80) -> str:
    return "\n".join(path.read_text(errors="replace").splitlines()[-n:])


def main() -> int:
    parser = argparse.ArgumentParser()
    parser.add_argument("--no-bug", action="store_true")
    args = parser.parse_args()

    gpu_memory_utilization = 0.5 if args.no_bug else 0.999
    ray.init(num_cpus=4)

    kwargs = dict(
        num_inference_engines=1,
        tensor_parallel_size=1,
        pipeline_parallel_size=1,
        data_parallel_size=1,
        model_dtype="bfloat16",
        pretrain="Qwen/Qwen3-0.6B",
        seed=42,
        vllm_v1_disable_multiproc=False,
        enable_prefix_caching=True,
        enforce_eager=True,
        gpu_memory_utilization=gpu_memory_utilization,
        inference_engine_enable_sleep=True,
        async_engine=True,
        backend="vllm",
        engine_init_kwargs={"max_model_len": 2048},
    )

    try:
        create_ray_wrapped_inference_engines(**kwargs)
    except ActorDiedError as e:
        if args.no_bug:
            print("UNEXPECTED: control path raised ActorDiedError")
            traceback.print_exc()
            return 3
        print("=" * 72)
        print("DRIVER-SIDE TRACEBACK (what SkyRL surfaces to the user):")
        print("=" * 72)
        traceback.print_exception(type(e), e, e.__traceback__)
        time.sleep(2)
        print("\n" + "=" * 72)
        print("ACTOR STDERR LOGS (where the real cause actually lives):")
        print("Glob: /tmp/ray/session_latest/logs/worker-*.err")
        print("=" * 72)
        err_logs = find_recent_actor_err_logs()
        if not err_logs:
            print("(no actor stderr logs matched in window)")
            return 3
        for p in err_logs:
            print(f"\n--- {p} ---")
            print(tail(p))
        return 2

    if args.no_bug:
        print("control path: engine init succeeded (as expected)")
        return 0
    print("UNEXPECTED: bug-trigger path completed without failure")
    return 3


if __name__ == "__main__":
    sys.exit(main())

This will output:

========================================================================
DRIVER-SIDE TRACEBACK (what SkyRL surfaces to the user):
========================================================================
Traceback (most recent call last):
  File ".../repro.py", line 60, in main
    create_ray_wrapped_inference_engines(**kwargs)
  File ".../skyrl/backends/skyrl_train/inference_engines/ray_wrapped_inference_engine.py", line 323, in create_ray_wrapped_inference_engines
    ray.get(sleep_refs)
  ...
ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::AsyncVLLMInferenceEngine.__init__() ...
  File ".../skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py", line 370, in _create_engine
    engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=stat_loggers)
  ...
  File ".../vllm/v1/engine/utils.py", line 1178, in wait_for_engine_startup
    raise RuntimeError(
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
========================================================================
ACTOR STDERR LOGS (where the real cause actually lives):
Glob: /tmp/ray/session_latest/logs/worker-*.err
========================================================================
--- /tmp/ray/session_latest/logs/worker-<hash>-ffffffff-<pid>.err ---
(EngineCore pid=<child>)   File ".../vllm/v1/worker/gpu_worker.py", line 283, in init_device
(EngineCore pid=<child>)     self.requested_memory = request_memory(init_snapshot, self.cache_config)
(EngineCore pid=<child>)   File ".../vllm/v1/worker/utils.py", line 413, in request_memory
(EngineCore pid=<child>)     raise ValueError(
(EngineCore pid=<child>) ValueError: Free memory on device cuda:0 (78.67/79.18 GiB) on startup is less than desired GPU memory utilization (0.999, 79.1 GiB). ...

The driver trace only has "Failed core proc(s): {}". The ValueError from the engine-core child is only in the per-actor .err file under /tmp/ray/session_latest/logs/.

Suggested fix

At ray_wrapped_inference_engine.py#L323, ray.get(sleep_refs) blocks on engine init and is where ActorDiedError first reaches the driver.

The requested fix is to attach the failed actor's stderr to the re-raised exception on ActorDiedError. Wrap the ray.get(sleep_refs) line in a try/except that, on ActorDiedError, reads the actor's per-process log files from Ray's session directory and re-raises with the engine-core child's stderr:

import contextlib
from ray.exceptions import ActorDiedError

try:
    ray.get(sleep_refs)
except ActorDiedError as e:
    diagnostics = []
    for engine_actor in inference_engine_actors:
        with contextlib.suppress(Exception):  # Don't block original exception
            # Ray exposes the actor's log paths via the runtime context;
            # or read from RAY_TMPDIR/session_latest/logs/worker-<id>.err
            log_path = _resolve_actor_stderr_log_path(engine_actor)
            tail = pathlib.Path(log_path).read_text().splitlines()[-200:]
            diagnostics.append("\n".join(tail))
    raise RuntimeError(
        "vLLM engine actor died during init. Tail of the actor stderr log(s):\n\n"
        + "\n\n--- next actor ---\n\n".join(diagnostics)
    ) from e

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions