Concrete Dropout for GeoTransolver model UQ by mnabian · Pull Request #1548 · NVIDIA/physicsnemo

mnabian · 2026-04-03T00:08:28Z

PhysicsNeMo Pull Request

Description

Adds concrete dropout-based uncertainty quantification (UQ) to GeoTransolver. Concrete dropout (Gal, Hron & Kendall, 2017) makes the dropout probability a learnable parameter per layer using the concrete (Gumbel-softmax) relaxation, enabling calibrated MC-Dropout inference without manual tuning of dropout rates.

All changes are backward compatible — concrete dropout is disabled by default and existing behavior is unchanged.

Changes

Model (`physicsnemo/experimental/models/geotransolver/`)

New file: concrete_dropout.py

ConcreteDropout module: wraps a layer with a learnable dropout probability using the concrete relaxation. During training, applies soft binary masks via z = sigmoid((log(u) - log(1-u) + log(p) - log(1-p)) / temperature) with temperature=0.1. During eval, passes input unchanged.
collect_concrete_dropout_losses(model): gathers entropy regularization losses from all ConcreteDropout modules. The loss p*log(p) + (1-p)*log(1-p) prevents dropout rates from collapsing to 0 or 1.
get_concrete_dropout_rates(model): extracts learned per-layer dropout probabilities for monitoring.

Modified: gale.py

GALE: replaces inherited out_dropout with ConcreteDropout when enabled (attention output projection).
GALE_block: adds ConcreteDropout after the attention residual connection (attn_dropout) and after the FFN residual connection (ffn_dropout).

Modified: context_projector.py

ContextProjector: adds ConcreteDropout on output slice tokens (output_dropout).
Parameters propagated through MultiScaleFeatureExtractor and GlobalContextBuilder.

Modified: geotransolver.py

Constructor accepts concrete_dropout: bool = False, dropout_reg: float = 1e-3, weight_reg: float = 1e-6.
New methods: concrete_dropout_reg_loss(), concrete_dropout_rates(), enable_mc_dropout().

Modified: __init__.py

Exports ConcreteDropout, collect_concrete_dropout_losses, get_concrete_dropout_rates.

Training recipe (`examples/.../transformer_models/src/`)

Modified: train.py

Adds concrete dropout regularization loss to the training loop, gated by lambda_reg > 0 (default 0, no-op).
Logs learned per-layer dropout rates to TensorBoard at each epoch end.

Modified: conf/model/geotransolver.yaml

concrete_dropout: false (default off)
dropout_reg: 1.0e-3 (entropy regularization coefficient, ~1/(2N) for N=400 samples)
weight_reg: 1.0e-6 (weight regularization coefficient, unused in loss since AdamW handles L2)

Modified: conf/training/base.yaml

lambda_reg: 0.0 (multiplier for regularization loss, default disabled)

Inference recipe (`examples/.../transformer_models/src/`)

Modified: inference_on_zarr.py

Adds mc_dropout_inference_loop(): runs N stochastic forward passes, returns mean predictions, std predictions, all per-sample predictions, averaged loss/metrics, and targets.
MC-Dropout model setup: model.eval() with ConcreteDropout layers kept in train mode. Gated by +mc_dropout_samples=N (default 0, disabled).
When enabled, logs per-field uncertainty stats and computes force coefficient uncertainty (Cd/Cl std across MC samples).

Modified: inference_on_vtk.py

Runs a deterministic forward pass (ConcreteDropout disabled) followed by an MC-Dropout pass (if enabled).
Writes per-point fields to VTP/VTU output files:
- PredictedPressure, PredictedWallShearStress (deterministic, always written)
- MCMeanPressure, MCMeanWallShearStress (MC mean, only when MC enabled)
- MCStdPressure, MCStdWallShearStress (MC std, only when MC enabled)
- Equivalent fields for volume mode (Velocity, Pressure, Nut)

Usage

Training with concrete dropout:

python src/train.py --config-name geotransolver_surface \
    model.concrete_dropout=true \
    training.lambda_reg=0.1

Inference with MC-Dropout UQ (zarr dataset):

python src/inference_on_zarr.py --config-name geotransolver_surface \
    +mc_dropout_samples=20

Inference with MC-Dropout UQ (raw VTK files):

python src/inference_on_vtk.py --config-name geotransolver_surface \
    +vtk_inference.input_dir=/path/to/runs \
    +vtk_inference.output_dir=/path/to/output \
    +mc_dropout_samples=20

Test

This has been tested on the DrivAerML dataset. Two interesting observations:

The accuracy of a GeoTransolver model trained with concrete dropout is on par with the accuracy of a baseline model without dropout.

There is high correlation between the magnitude of the standard deviation of MC dropout and the magnitude of the prediction error, suggesting uncertainty bounds are meaningful.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

None

greptile-apps · 2026-04-03T00:13:17Z

Greptile Summary

This PR adds Concrete Dropout-based uncertainty quantification to GeoTransolver by introducing a learnable per-layer dropout probability (via the binary-concrete relaxation) that enables calibrated MC-Dropout inference without manual hyperparameter tuning. All changes are backward-compatible — the feature is opt-in via concrete_dropout: false default. The core model integration (gale.py, context_projector.py, geotransolver.py) is clean and well-structured, and the training recipe changes are minimal and correct.

There are a few issues worth addressing before merge:

weight_reg is a silent no-op (concrete_dropout.py lines 160–183): the parameter is accepted, stored, and described in the regularization_loss() docstring as an active weight-penalty term, but is never used in the loss computation. This creates a misleading interface for users who tune this value expecting it to affect training.
Return type annotation mismatch (inference_on_zarr.py line 348): mc_dropout_inference_loop is annotated as returning a 4-tuple but actually returns 6 values, breaking static type checking.
Train/eval mode toggled per-run inside a compiled model (inference_on_vtk.py lines 699–750): when cfg.compile=True and MC-Dropout is active, each run in the loop flips ConcreteDropout submodules between eval and train. torch.compile treats the training flag as a compile-time constant, so this forces a full graph retrace on every run. inference_on_zarr.py handles this correctly (mode set once before the loop); inference_on_vtk.py should follow the same pattern.
Unused import (concrete_dropout.py line 34): import torch.nn.functional as F is never referenced.
print instead of logger (inference_on_zarr.py line 396): one progress message inside mc_dropout_inference_loop uses a bare print() rather than the logger, bypassing rank-filtering and log-level control.

Important Files Changed

Filename	Overview
physicsnemo/experimental/models/geotransolver/concrete_dropout.py	New file implementing ConcreteDropout; the `weight_reg` parameter and `in_features` are accepted, stored, and documented as active components of the regularization loss but are never used in `regularization_loss()`, creating a misleading API. Unused `torch.nn.functional as F` import should be removed.
physicsnemo/experimental/models/geotransolver/gale.py	ConcreteDropout is cleanly integrated into GALE and GALE_block; out_dropout override and attn/ffn dropout additions are backward-compatible and logically correct.
physicsnemo/experimental/models/geotransolver/context_projector.py	concrete_dropout parameters are propagated correctly through MultiScaleFeatureExtractor and GlobalContextBuilder; ContextProjector output_dropout integration is clean.
physicsnemo/experimental/models/geotransolver/geotransolver.py	New constructor params and convenience methods (concrete_dropout_reg_loss, concrete_dropout_rates, enable_mc_dropout) are well-implemented and backward compatible.
examples/cfd/external_aerodynamics/transformer_models/src/inference_on_zarr.py	mc_dropout_inference_loop return type annotation claims 4 elements but returns 6; bare print() bypasses logging infrastructure. MC-dropout mode setup order (compile then set mode) is correct here.
examples/cfd/external_aerodynamics/transformer_models/src/inference_on_vtk.py	Per-run toggling of ConcreteDropout train/eval state inside the compiled model's inference loop will force torch.compile to retrace the graph on every run, causing severe performance regression when both compile and mc_dropout_samples are enabled.
examples/cfd/external_aerodynamics/transformer_models/src/train.py	Regularization loss integration and TensorBoard logging of dropout rates are clean and correctly gated by lambda_reg and concrete_dropout flags.
physicsnemo/experimental/models/geotransolver/init.py	Correctly exports the three new public symbols from concrete_dropout.py.

Comments Outside Diff (2)

examples/cfd/external_aerodynamics/transformer_models/src/inference_on_zarr.py, line 348-349 (link)

Return type annotation does not match actual return value

The function signature declares a 4-element return tuple:
```
) -> tuple[torch.Tensor, torch.Tensor, float, dict]:
```
but the function actually returns 6 values:
```
return mean_predictions, std_predictions, stacked, mean_loss, mean_metrics, targets
```
This mismatch will confuse static analysis tools and anyone writing callers against this type hint. The annotation should match the actual return:
examples/cfd/external_aerodynamics/transformer_models/src/inference_on_zarr.py, line 396 (link)

print used instead of logger

The rest of both inference_on_zarr.py and inference_on_vtk.py consistently use logger.info(...). This bare print() call will bypass any logging configuration (e.g., distributed rank filtering, log-level control):

_{Reviews (1): Last reviewed commit: "concrete dropout for geotransolver" | Re-trigger Greptile}

coreyjadams

Hi @mnabian - I think ConcreteDropout is a great idea to add to physicsnemo in general, actually! We have already GumbelSoftmax which, numerically, is closely related: https://github.com/NVIDIA/physicsnemo/blob/main/physicsnemo/nn/module/gumbel_softmax.py

If I understand it correctly, ConcreteDropout is a two-category optimization of that same mathematical concept. Could we add this implementation in a similar way, in physicsnemo.nn, and add some tests? I think it is just fine to add it to GeoTransolver like this.

I don't think we need to merge the implementations, btw: the sigmoid version here is computationally more efficient than softmax in 2 categories.

mnabian · 2026-04-10T00:52:11Z

/blossom-ci

mnabian · 2026-04-10T00:52:56Z

Hi @mnabian - I think ConcreteDropout is a great idea to add to physicsnemo in general, actually! We have already GumbelSoftmax which, numerically, is closely related: https://github.com/NVIDIA/physicsnemo/blob/main/physicsnemo/nn/module/gumbel_softmax.py

If I understand it correctly, ConcreteDropout is a two-category optimization of that same mathematical concept. Could we add this implementation in a similar way, in physicsnemo.nn, and add some tests? I think it is just fine to add it to GeoTransolver like this.

I don't think we need to merge the implementations, btw: the sigmoid version here is computationally more efficient than softmax in 2 categories.

Thanks for reviewing the PR. I have moved the concrete_dropout to nn/module, and added tests.

mnabian · 2026-04-10T00:53:17Z

/blossom-ci

loliverhennigh

LGTM

coreyjadams

THanks @mnabian looks good!

mnabian · 2026-04-13T22:27:01Z

/blossom-ci

mnabian · 2026-04-13T23:39:52Z

/blossom-ci

concrete dropout for geotransolver

6bbb568

mnabian requested review from RishikeshRanade and coreyjadams as code owners April 3, 2026 00:08

greptile-apps bot reviewed Apr 3, 2026

View reviewed changes

Comment thread physicsnemo/experimental/models/geotransolver/concrete_dropout.py Outdated

Comment thread physicsnemo/experimental/models/geotransolver/concrete_dropout.py Outdated

Comment thread examples/cfd/external_aerodynamics/transformer_models/src/inference_on_vtk.py

mnabian self-assigned this Apr 3, 2026

coreyjadams requested changes Apr 7, 2026

View reviewed changes

mnabian added 2 commits April 9, 2026 15:56

address greptile comments

4781c8c

move concrete dropout to nn.module

1877d02

mnabian requested a review from loliverhennigh as a code owner April 9, 2026 23:02

mnabian added 4 commits April 9, 2026 17:12

fix remaining issues

fe2118f

add tests

6f32eae

formatting

0d419eb

clean up

3eeba73

Merge branch 'main' into geotransolver-uq

4d47888

mnabian requested a review from coreyjadams April 10, 2026 00:53

Merge branch 'main' into geotransolver-uq

e4b022f

loliverhennigh reviewed Apr 10, 2026

View reviewed changes

Comment thread physicsnemo/nn/module/concrete_dropout.py

loliverhennigh approved these changes Apr 10, 2026

View reviewed changes

RishikeshRanade approved these changes Apr 10, 2026

View reviewed changes

Comment thread examples/cfd/external_aerodynamics/transformer_models/src/conf/model/geotransolver.yaml

Comment thread physicsnemo/experimental/models/geotransolver/concrete_dropout.py Outdated

Comment thread physicsnemo/experimental/models/geotransolver/context_projector.py Outdated

mnabian added 2 commits April 10, 2026 12:32

revert argument order in MultiScaleFeatureExtractor.forward

22bfdf6

cleanup

941761e

coreyjadams approved these changes Apr 13, 2026

View reviewed changes

mnabian added 2 commits April 13, 2026 14:41

Merge branch 'main' into geotransolver-uq

ae10a15

fix test, update readme

10398bf

mnabian enabled auto-merge April 13, 2026 22:27

fix docstring

a4e3997

mnabian added this pull request to the merge queue Apr 14, 2026

Merged via the queue into NVIDIA:main with commit 3bba0e1 Apr 14, 2026
4 checks passed

Conversation

mnabian commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Changes

Model (physicsnemo/experimental/models/geotransolver/)

Training recipe (examples/.../transformer_models/src/)

Inference recipe (examples/.../transformer_models/src/)

Usage

Test

Checklist

Dependencies

Uh oh!

greptile-apps bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Important Files Changed

Comments Outside Diff (2)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coreyjadams left a comment

Choose a reason for hiding this comment

Uh oh!

mnabian commented Apr 10, 2026

Uh oh!

mnabian commented Apr 10, 2026

Uh oh!

mnabian commented Apr 10, 2026

Uh oh!

Uh oh!

loliverhennigh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coreyjadams left a comment

Choose a reason for hiding this comment

Uh oh!

mnabian commented Apr 13, 2026

Uh oh!

mnabian commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mnabian commented Apr 3, 2026 •

edited

Loading

Model (`physicsnemo/experimental/models/geotransolver/`)

Training recipe (`examples/.../transformer_models/src/`)

Inference recipe (`examples/.../transformer_models/src/`)

greptile-apps bot commented Apr 3, 2026 •

edited

Loading