Skip to content

Diffusion ci tests#1495

Merged
CharlelieLrt merged 17 commits intoNVIDIA:mainfrom
CharlelieLrt:diffusion-ci-tests
Apr 8, 2026
Merged

Diffusion ci tests#1495
CharlelieLrt merged 17 commits intoNVIDIA:mainfrom
CharlelieLrt:diffusion-ci-tests

Conversation

@CharlelieLrt
Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@CharlelieLrt CharlelieLrt self-assigned this Mar 13, 2026
@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Copy Markdown
Collaborator

Hey @greptile this pr introduces nearly 400 binary files for numerical checkpoints and consistency, which might complicate git cloning. Got any suggestions for handling that?

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 13, 2026

Here are a few approaches to handle those binary checkpoint files:

1. Git LFS (most common)

  • Move binary files to Git LFS - keeps them out of clone history
  • Add to .gitattributes: *.npy filter=lfs diff=lfs merge=lfs -text (or whatever format you're using)
  • Clones stay fast, files downloaded only when needed
  • CI can use GIT_LFS_SKIP_SMUDGE=1 to skip download, then selectively fetch

2. External storage with download script

  • Store checkpoints in S3/GCS/Azure blob storage
  • Add a scripts/download_test_fixtures.py that fetches them before tests
  • CI downloads on demand, developers only download if running those specific tests
  • Keep a manifest file (JSON/YAML) tracking what's available

3. Hybrid: essential vs full test suites

  • Keep a minimal set (~10-20 files) for basic CI checks in repo
  • Store the full 400-file suite externally for comprehensive testing
  • Run full suite only on release branches or manually triggered workflows

4. Programmatic generation

  • If checkpoints are deterministic, generate them in CI from smaller seed files
  • Only store the configuration/seeds (much smaller)
  • Trade storage for compute time

For PhysicsNeMo specifically, I'd suggest Git LFS + lazy loading in CI (option 1) since it:

  • Maintains reproducibility (files versioned with code)
  • Keeps clones fast
  • Integrates cleanly with existing workflows

What kind of files are these - numpy arrays, model weights, HDF5?

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

… for compilation tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Copy link
Copy Markdown
Collaborator

@laserkelvin laserkelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my comment, the other question is whether you think we need coverage for the score_predictor path as well

Comment thread test/diffusion/helpers.py
@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

In addition to my comment, the other question is whether you think we need coverage for the score_predictor path as well

Right, I added tests for the score_predictor path as well

@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
@CharlelieLrt CharlelieLrt requested a review from ktangsali as a code owner April 7, 2026 23:31
@CharlelieLrt CharlelieLrt enabled auto-merge April 7, 2026 23:32
@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Copy link
Copy Markdown
Collaborator

@laserkelvin laserkelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Copy link
Copy Markdown
Collaborator

@ktangsali ktangsali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to Makefile look good. I am concerned about the 506 files though. Will this affect the clone speeds ? I guess this is inevitable in some sense

@CharlelieLrt CharlelieLrt added this pull request to the merge queue Apr 8, 2026
Merged via the queue into NVIDIA:main with commit 06ec2fc Apr 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants