[bug] Switch to `reload_weights` API for loading weights in legacy inference codepath by SumanthRH · Pull Request #1685 · NovaSky-AI/SkyRL

SumanthRH · 2026-05-18T05:32:56Z

What does this PR do?

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

gemini-code-assist

Code Review

This pull request updates the weight loading logic in vllm_worker.py by introducing the set_current_vllm_config context manager and switching to model_runner.reload_weights. Feedback suggests optimizing memory efficiency by passing the weight generator directly to the loading function instead of accumulating tensors in an intermediate list.

gemini-code-assist · 2026-05-18T05:35:27Z

        for name, tensor in self._weight_receiver.receive_weights(request):
            weight_list.append((name, tensor))

-        self.model_runner.model.load_weights(weights=weight_list)
+        with torch.device(self.device), set_current_vllm_config(self.vllm_config):
+            self.model_runner.reload_weights(weights_iterator=iter(weight_list))


Instead of collecting all weights into an intermediate list, you can pass the generator from receive_weights directly to reload_weights. This reduces memory overhead by avoiding storing all tensors in a list simultaneously, which is particularly important for large models. It also allows vLLM to pipeline the weight loading process as tensors are received.

Note that this change makes the subsequent weight_list cleanup loop (lines 98-99) redundant as the list will remain empty, but since those lines are context, they can be left as-is or removed in a separate cleanup.

Suggested change

for name, tensor in self._weight_receiver.receive_weights(request):

weight_list.append((name, tensor))

self.model_runner.model.load_weights(weights=weight_list)

with torch.device(self.device), set_current_vllm_config(self.vllm_config):

self.model_runner.reload_weights(weights_iterator=iter(weight_list))

with torch.device(self.device), set_current_vllm_config(self.vllm_config):

self.model_runner.reload_weights(

weights_iterator=self._weight_receiver.receive_weights(request))

switch to reload api

6108925

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH marked this pull request as ready for review May 18, 2026 05:33

SumanthRH added the run_train_old_inference_gpu_ci label May 18, 2026

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Switch to `reload_weights` API for loading weights in legacy inference codepath#1685

[bug] Switch to `reload_weights` API for loading weights in legacy inference codepath#1685
SumanthRH wants to merge 1 commit into
mainfrom
fix-load-weights-moe

SumanthRH commented May 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SumanthRH commented May 18, 2026

What does this PR do?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant