When SkyRL's create_ray_wrapped_inference_engines builds an AsyncVLLMInferenceEngine Ray actor whose vLLM v1 engine-core child process dies during init, the driver-side ActorDiedError that surfaces from ray.get(sleep_refs) bottoms out at vLLM's wait_for_engine_startup with:
File "vllm/v1/engine/utils.py", line 1178, in wait_for_engine_startup
raise RuntimeError(
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
This RuntimeError is tough to debug. Ultimately I had to learn the underlying engine-core stderr/traceback is written only to /tmp/ray/session_*/logs/worker-*.err.
Reproduction
Run the below reproducer with Python 3.12, skyrl==0.2.0, ray==2.51.1, vllm==0.20.2 with at least one GPU. The reproducer's failure trigger is gpu_memory_utilization=0.999, which forces vLLM's engine-core child to raise ValueError: Free memory on device cuda:0 ... is less than desired GPU memory utilization inside its request_memory() call.
import argparse
import glob
import os
import sys
import time
import traceback
from pathlib import Path
import ray
from ray.exceptions import ActorDiedError
from skyrl.backends.skyrl_train.inference_engines.ray_wrapped_inference_engine import (
create_ray_wrapped_inference_engines,
)
def find_recent_actor_err_logs(window_s: float = 180.0) -> list[Path]:
cutoff = time.time() - window_s
paths = [
Path(p)
for p in glob.glob("/tmp/ray/session_latest/logs/worker-*.err")
if os.path.getmtime(p) >= cutoff and os.path.getsize(p) > 0
]
return sorted(paths, key=os.path.getmtime, reverse=True)
def tail(path: Path, n: int = 80) -> str:
return "\n".join(path.read_text(errors="replace").splitlines()[-n:])
def main() -> int:
parser = argparse.ArgumentParser()
parser.add_argument("--no-bug", action="store_true")
args = parser.parse_args()
gpu_memory_utilization = 0.5 if args.no_bug else 0.999
ray.init(num_cpus=4)
kwargs = dict(
num_inference_engines=1,
tensor_parallel_size=1,
pipeline_parallel_size=1,
data_parallel_size=1,
model_dtype="bfloat16",
pretrain="Qwen/Qwen3-0.6B",
seed=42,
vllm_v1_disable_multiproc=False,
enable_prefix_caching=True,
enforce_eager=True,
gpu_memory_utilization=gpu_memory_utilization,
inference_engine_enable_sleep=True,
async_engine=True,
backend="vllm",
engine_init_kwargs={"max_model_len": 2048},
)
try:
create_ray_wrapped_inference_engines(**kwargs)
except ActorDiedError as e:
if args.no_bug:
print("UNEXPECTED: control path raised ActorDiedError")
traceback.print_exc()
return 3
print("=" * 72)
print("DRIVER-SIDE TRACEBACK (what SkyRL surfaces to the user):")
print("=" * 72)
traceback.print_exception(type(e), e, e.__traceback__)
time.sleep(2)
print("\n" + "=" * 72)
print("ACTOR STDERR LOGS (where the real cause actually lives):")
print("Glob: /tmp/ray/session_latest/logs/worker-*.err")
print("=" * 72)
err_logs = find_recent_actor_err_logs()
if not err_logs:
print("(no actor stderr logs matched in window)")
return 3
for p in err_logs:
print(f"\n--- {p} ---")
print(tail(p))
return 2
if args.no_bug:
print("control path: engine init succeeded (as expected)")
return 0
print("UNEXPECTED: bug-trigger path completed without failure")
return 3
if __name__ == "__main__":
sys.exit(main())
This will output:
========================================================================
DRIVER-SIDE TRACEBACK (what SkyRL surfaces to the user):
========================================================================
Traceback (most recent call last):
File ".../repro.py", line 60, in main
create_ray_wrapped_inference_engines(**kwargs)
File ".../skyrl/backends/skyrl_train/inference_engines/ray_wrapped_inference_engine.py", line 323, in create_ray_wrapped_inference_engines
ray.get(sleep_refs)
...
ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::AsyncVLLMInferenceEngine.__init__() ...
File ".../skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py", line 370, in _create_engine
engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=stat_loggers)
...
File ".../vllm/v1/engine/utils.py", line 1178, in wait_for_engine_startup
raise RuntimeError(
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
========================================================================
ACTOR STDERR LOGS (where the real cause actually lives):
Glob: /tmp/ray/session_latest/logs/worker-*.err
========================================================================
--- /tmp/ray/session_latest/logs/worker-<hash>-ffffffff-<pid>.err ---
(EngineCore pid=<child>) File ".../vllm/v1/worker/gpu_worker.py", line 283, in init_device
(EngineCore pid=<child>) self.requested_memory = request_memory(init_snapshot, self.cache_config)
(EngineCore pid=<child>) File ".../vllm/v1/worker/utils.py", line 413, in request_memory
(EngineCore pid=<child>) raise ValueError(
(EngineCore pid=<child>) ValueError: Free memory on device cuda:0 (78.67/79.18 GiB) on startup is less than desired GPU memory utilization (0.999, 79.1 GiB). ...
The driver trace only has "Failed core proc(s): {}". The ValueError from the engine-core child is only in the per-actor .err file under /tmp/ray/session_latest/logs/.
Suggested fix
At ray_wrapped_inference_engine.py#L323, ray.get(sleep_refs) blocks on engine init and is where ActorDiedError first reaches the driver.
The requested fix is to attach the failed actor's stderr to the re-raised exception on ActorDiedError. Wrap the ray.get(sleep_refs) line in a try/except that, on ActorDiedError, reads the actor's per-process log files from Ray's session directory and re-raises with the engine-core child's stderr:
import contextlib
from ray.exceptions import ActorDiedError
try:
ray.get(sleep_refs)
except ActorDiedError as e:
diagnostics = []
for engine_actor in inference_engine_actors:
with contextlib.suppress(Exception): # Don't block original exception
# Ray exposes the actor's log paths via the runtime context;
# or read from RAY_TMPDIR/session_latest/logs/worker-<id>.err
log_path = _resolve_actor_stderr_log_path(engine_actor)
tail = pathlib.Path(log_path).read_text().splitlines()[-200:]
diagnostics.append("\n".join(tail))
raise RuntimeError(
"vLLM engine actor died during init. Tail of the actor stderr log(s):\n\n"
+ "\n\n--- next actor ---\n\n".join(diagnostics)
) from e
When SkyRL's
create_ray_wrapped_inference_enginesbuilds anAsyncVLLMInferenceEngineRay actor whose vLLM v1 engine-core child process dies during init, the driver-sideActorDiedErrorthat surfaces fromray.get(sleep_refs)bottoms out at vLLM'swait_for_engine_startupwith:This
RuntimeErroris tough to debug. Ultimately I had to learn the underlying engine-core stderr/traceback is written only to/tmp/ray/session_*/logs/worker-*.err.Reproduction
Run the below reproducer with Python 3.12,
skyrl==0.2.0,ray==2.51.1,vllm==0.20.2with at least one GPU. The reproducer's failure trigger isgpu_memory_utilization=0.999, which forces vLLM's engine-core child to raiseValueError: Free memory on device cuda:0 ... is less than desired GPU memory utilizationinside itsrequest_memory()call.This will output:
The driver trace only has
"Failed core proc(s): {}". TheValueErrorfrom the engine-core child is only in the per-actor.errfile under/tmp/ray/session_latest/logs/.Suggested fix
At
ray_wrapped_inference_engine.py#L323,ray.get(sleep_refs)blocks on engine init and is whereActorDiedErrorfirst reaches the driver.The requested fix is to attach the failed actor's stderr to the re-raised exception on
ActorDiedError. Wrap theray.get(sleep_refs)line in a try/except that, onActorDiedError, reads the actor's per-process log files from Ray's session directory and re-raises with the engine-core child's stderr: