Skip to content

Fix gather_objects when Megatron Core is unavailable#15839

Open
fallintoplace wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
fallintoplace:fix/gather-objects-no-megatron
Open

Fix gather_objects when Megatron Core is unavailable#15839
fallintoplace wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
fallintoplace:fix/gather-objects-no-megatron

Conversation

@fallintoplace

Copy link
Copy Markdown

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Fix gather_objects() so it does not crash when Megatron Core is unavailable and still gathers results in plain torch.distributed setups.

Collection: Common

Changelog

  • avoid touching parallel_state when Megatron Core could not be imported
  • preserve the existing Megatron data-parallel path when Megatron Core is available
  • fall back to the default torch.distributed group when Megatron Core is unavailable
  • add regression coverage for the no-Megatron/no-DDP path and the no-Megatron/plain-DDP gather path

Usage

predictions = gather_objects(predictions, main_rank=0)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to #
  • I couldn't run pytest in this checkout because the active interpreter is missing pytest, so I verified the change with python3 -m compileall, git diff --check, and a focused isolated smoke test that exercises the no-Megatron branches.

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants