`WorkerWrap.load_weights` silently corrupts unquantized MoE rollouts after the first weight sync since `vllm>=0.20.0`

[`WorkerWrap.load_weights`](https://github.com/NovaSky-AI/SkyRL/blob/skyrl-v0.2.0/skyrl/backends/skyrl_train/inference_servers/vllm_worker.py#L74-L96) calls vLLM's raw `model.load_weights(...)` directly. Per the upstream https://github.com/vllm-project/vllm/issues/42821 (since `vllm==0.20.0` was pulled in via https://github.com/NovaSky-AI/SkyRL/pull/1628), that entrypoint is broken for unquantized MoE on FlashInfer backends (e.g. [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)).

SkyRL should move to `self.model_runner.reload_weights(weights_iterator=...)` ([link](https://github.com/vllm-project/vllm/blob/v0.20.2/vllm/v1/worker/gpu_model_runner.py#L4972-L4986)), which is idempotent across repeated weight syncs.

```diff
+ from vllm.config import set_current_vllm_config

...

 def load_weights(self, request: bytes) -> None:
     ...
     weight_list = []
     for name, tensor in self._weight_receiver.receive_weights(request):
         weight_list.append((name, tensor))

-    self.model_runner.model.load_weights(weights=weight_list)
+    with set_current_vllm_config(self.vllm_config):
+        self.model_runner.reload_weights(weights_iterator=iter(weight_list))

     for weight in weight_list:
         del weight
```

This will also match vLLM's own [`reload_weights` RPC](https://github.com/vllm-project/vllm/blob/v0.20.2/vllm/v1/worker/gpu_worker.py#L328-L329).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`WorkerWrap.load_weights` silently corrupts unquantized MoE rollouts after the first weight sync since `vllm>=0.20.0` #1680

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

WorkerWrap.load_weights silently corrupts unquantized MoE rollouts after the first weight sync since vllm>=0.20.0 #1680

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`WorkerWrap.load_weights` silently corrupts unquantized MoE rollouts after the first weight sync since `vllm>=0.20.0` #1680