Diffusion ci tests by CharlelieLrt · Pull Request #1495 · NVIDIA/physicsnemo

CharlelieLrt · 2026-03-12T23:58:59Z

PhysicsNeMo Pull Request

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

CharlelieLrt · 2026-03-13T00:01:34Z

/blossom-ci

coreyjadams · 2026-03-13T00:40:17Z

Hey @greptile this pr introduces nearly 400 binary files for numerical checkpoints and consistency, which might complicate git cloning. Got any suggestions for handling that?

greptile-apps · 2026-03-13T00:40:56Z

Here are a few approaches to handle those binary checkpoint files:

1. Git LFS (most common)

Move binary files to Git LFS - keeps them out of clone history
Add to .gitattributes: *.npy filter=lfs diff=lfs merge=lfs -text (or whatever format you're using)
Clones stay fast, files downloaded only when needed
CI can use GIT_LFS_SKIP_SMUDGE=1 to skip download, then selectively fetch

2. External storage with download script

Store checkpoints in S3/GCS/Azure blob storage
Add a scripts/download_test_fixtures.py that fetches them before tests
CI downloads on demand, developers only download if running those specific tests
Keep a manifest file (JSON/YAML) tracking what's available

3. Hybrid: essential vs full test suites

Keep a minimal set (~10-20 files) for basic CI checks in repo
Store the full 400-file suite externally for comprehensive testing
Run full suite only on release branches or manually triggered workflows

4. Programmatic generation

If checkpoints are deterministic, generate them in CI from smaller seed files
Only store the configuration/seeds (much smaller)
Trade storage for compute time

For PhysicsNeMo specifically, I'd suggest Git LFS + lazy loading in CI (option 1) since it:

Maintains reproducibility (files versioned with code)
Keeps clones fast
Integrates cleanly with existing workflows

What kind of files are these - numpy arrays, model weights, HDF5?

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

CharlelieLrt · 2026-03-13T02:54:02Z

/blossom-ci

… for compilation tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

CharlelieLrt · 2026-03-13T17:11:37Z

/blossom-ci

laserkelvin

In addition to my comment, the other question is whether you think we need coverage for the score_predictor path as well

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

CharlelieLrt · 2026-04-07T23:13:18Z

In addition to my comment, the other question is whether you think we need coverage for the score_predictor path as well

Right, I added tests for the score_predictor path as well

CharlelieLrt · 2026-04-07T23:13:26Z

/blossom-ci

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

CharlelieLrt · 2026-04-07T23:32:22Z

/blossom-ci

laserkelvin

LGTM

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

CharlelieLrt · 2026-04-08T00:45:28Z

/blossom-ci

…modulus into diffusion-ci-tests

CharlelieLrt · 2026-04-08T02:36:52Z

/blossom-ci

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

CharlelieLrt · 2026-04-08T04:35:20Z

/blossom-ci

ktangsali

Changes to Makefile look good. I am concerned about the 506 files though. Will this affect the clone speeds ? I guess this is inevitable in some sense

CharlelieLrt added 7 commits March 10, 2026 14:44

New test files for diffusion

853e85e

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

Merge remote-tracking branch 'upstream/main' into diffusion-ci-tests

e6a5d1c

Merge remote-tracking branch 'upstream/main' into diffusion-ci-tests

acb5560

Merge remote-tracking branch 'upstream/main' into diffusion-ci-tests

1e6491f

Merge remote-tracking branch 'upstream/main' into diffusion-ci-tests

14bd020

Added missing CI tests for physicsnemo-diffusion

b957b14

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

Merge remote-tracking branch 'upstream/main' into diffusion-ci-tests

4e452b4

CharlelieLrt requested a review from coreyjadams March 12, 2026 23:59

CharlelieLrt self-assigned this Mar 13, 2026

CharlelieLrt added 2 commits March 12, 2026 17:41

Increased CPU tolerance for cross-arch compatibility

fa48148

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

Disabled some non-regression tests in test_samplers.py

2db1e3f

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

Disable some non-regression tests on GPU and increase come tolerances…

07c9c2e

… for compilation tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

laserkelvin reviewed Apr 3, 2026

View reviewed changes

Comment thread test/diffusion/helpers.py

CharlelieLrt added 2 commits April 7, 2026 13:47

Merge branch 'main' into diffusion-ci-tests

42fee9f

Added some compile tests, kernel size 3, tests for score predictor

5df4350

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

Re-enable CI coverage for diffusion

364e742

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

CharlelieLrt requested a review from ktangsali as a code owner April 7, 2026 23:31

CharlelieLrt enabled auto-merge April 7, 2026 23:32

laserkelvin approved these changes Apr 7, 2026

View reviewed changes

Increase tolerances for sampler tests

1c9df70

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

Merge branch 'main' into diffusion-ci-tests

212a2cf

Merge branch 'diffusion-ci-tests' of https://github.com/CharlelieLrt/…

6cc0afd

…modulus into diffusion-ci-tests

Increase tolerances for sampler tests

c2eb6ee

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

ktangsali approved these changes Apr 8, 2026

View reviewed changes

CharlelieLrt added this pull request to the merge queue Apr 8, 2026

Merged via the queue into NVIDIA:main with commit 06ec2fc Apr 8, 2026
4 checks passed

Conversation

CharlelieLrt commented Mar 12, 2026

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

CharlelieLrt commented Mar 13, 2026

Uh oh!

coreyjadams commented Mar 13, 2026

Uh oh!

greptile-apps bot commented Mar 13, 2026

Uh oh!

CharlelieLrt commented Mar 13, 2026

Uh oh!

CharlelieLrt commented Mar 13, 2026

Uh oh!

laserkelvin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CharlelieLrt commented Apr 7, 2026

Uh oh!

CharlelieLrt commented Apr 7, 2026

Uh oh!

CharlelieLrt commented Apr 7, 2026

Uh oh!

laserkelvin left a comment

Choose a reason for hiding this comment

Uh oh!

CharlelieLrt commented Apr 8, 2026

Uh oh!

CharlelieLrt commented Apr 8, 2026

Uh oh!

CharlelieLrt commented Apr 8, 2026

Uh oh!

ktangsali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants