Skip to content

WorkerWrap.load_weights silently corrupts unquantized MoE rollouts after the first weight sync since vllm>=0.20.0 #1680

@jamesbraza

Description

@jamesbraza

WorkerWrap.load_weights calls vLLM's raw model.load_weights(...) directly. Per the upstream vllm-project/vllm#42821 (since vllm==0.20.0 was pulled in via #1628), that entrypoint is broken for unquantized MoE on FlashInfer backends (e.g. Qwen/Qwen3.6-35B-A3B).

SkyRL should move to self.model_runner.reload_weights(weights_iterator=...) (link), which is idempotent across repeated weight syncs.

+ from vllm.config import set_current_vllm_config

...

 def load_weights(self, request: bytes) -> None:
     ...
     weight_list = []
     for name, tensor in self._weight_receiver.receive_weights(request):
         weight_list.append((name, tensor))

-    self.model_runner.model.load_weights(weights=weight_list)
+    with set_current_vllm_config(self.vllm_config):
+        self.model_runner.reload_weights(weights_iterator=iter(weight_list))

     for weight in weight_list:
         del weight

This will also match vLLM's own reload_weights RPC.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions