Skip to content

Low-Mach number fluid scheme (variable-density, ideal-gas EOS) [WIP]#2571

Draft
adigh23 wants to merge 15 commits into
ExtremeFLOW:developfrom
adigh23:feature/lowmach
Draft

Low-Mach number fluid scheme (variable-density, ideal-gas EOS) [WIP]#2571
adigh23 wants to merge 15 commits into
ExtremeFLOW:developfrom
adigh23:feature/lowmach

Conversation

@adigh23

@adigh23 adigh23 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Draft / work-in-progress. Adds a low-Mach-number formulation of the Pn/Pn
fluid scheme for variable-density flows at negligible acoustic effects, coupling
the density to temperature through an ideal-gas equation of state.

Opening early to share the approach and gather feedback. The branch is based on
an older develop (merge base 2026-04-02, ~28 commits behind) and needs a
rebase, the remaining accelerator backends, and quantitative validation before
it is merge-ready — hence draft.

What it adds

A new fluid scheme lowmach (fluid_lowmach_pnpn), registered through the fluid
factory, extending the incompressible Pn/Pn solver with:

  • Ideal-gas EOSrho = P0 / (R_specific * T); the density field is updated
    each step from the scalar temperature (P0, R_specific, T_eos_min/max,
    temperature_field config keys).
  • Thermal divergence Q_T in the pressure-Poisson residual — the velocity is
    no longer solenoidal; div(u) = Q_T is computed from the temperature field and
    fed into the pressure solve.
  • lowmach_residual module (+ CPU backend lowmach_res_cpu, factory) that
    assembles the momentum and pressure residuals with field-valued rho and
    mu
    (spatially varying properties) instead of scalar constants.

Files (relative to the merge base)

  • src/fluid/fluid_lowmach_pnpn.f90 — new scheme (EOS plumbing, Q_T from T,
    field-valued rho in the momentum RHS)
  • src/fluid/lowmach_residual.f90, src/fluid/lowmach_res_fctry.f90,
    src/fluid/bcknd/cpu/lowmach_res_cpu.f90 — residual module + CPU backend
  • src/fluid/fluid_base_fctry.f90, src/Makefile.am, src/.depends — wiring
  • examples/rayleigh_benard/rayleigh_lowmach.{case,f90},
    examples/tgv/tgv_lowmach_scalar.case — validation/example cases

TODO before merge-ready

  • Rebase onto current develop
  • GPU (CUDA/HIP/OpenCL) and SX backends for the low-Mach residual
  • Quantitative validation (e.g. Rayleigh–Bénard Nu–Ra scaling) + mass/energy conservation diagnostics
  • Unit/regression tests

🤖 Generated with Claude Code

adigh23 and others added 11 commits April 19, 2026 20:20
Registers a new "lowmach" fluid scheme via the factory. The type
extends fluid_pnpn_t with no overrides, so the scheme is selectable
from the .case file but behaves identically to the standard Pn-Pn
solver. Physics (variable density, Q_T source, full-stress viscous)
will be added incrementally by overriding init/step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new step() body is a verbatim copy of fluid_pnpn_step. Output is
bit-identical to the standard Pn-Pn solver, confirming the override
dispatches correctly. This is the empty canvas onto which low-Mach
physics will be layered in subsequent commits.
Override init() to read case.fluid.low_mach.{enabled,P0,R_specific,
temperature_field} from the case file, with sensible defaults.

At the top of step(), when low-Mach mode is enabled, resolve the
temperature field from the registry lazily and update the density field
pointwise via rho = P0 / (R_specific * T). The existing Pn-Pn residuals
still read rho as a point value, so this spatial update does not yet
affect the solution — it is infrastructure for the subsequent residual
replacement.

Output is bit-identical to the standard Pn-Pn solver when low-Mach is
disabled, and produces identical residuals when enabled with no scalar
present (lazy T lookup safely falls through).
Adds an abstract lowmach_residual module with pressure and velocity
residual types, a CPU backend (lowmach_res_cpu), and a factory
submodule. The CPU implementation accepts a Q_T (thermal divergence)
field argument but currently ignores it; the body reproduces pnpn_res_cpu
exactly.

fluid_lowmach_pnpn_t now owns lm_prs_res, lm_vel_res allocatable members
plus a zero-initialised Q_T field. When low_mach_enabled is true, step()
routes pressure and velocity residuals through these new operators
instead of the inherited pnpn ones. When false, the inherited path is
used verbatim.

Verification: tgv example produces bit-identical output with both
low_mach disabled (old pnpn path) and enabled (new lowmach_residual
path). The scaffolding is now in place for the Q_T source contribution
and variable-property stress formulation in subsequent commits.

Emits a neko_error at init time if low_mach is requested with
NEKO_BCKND_DEVICE=1, since the device backend is not yet implemented.
Injects the thermal divergence source into the low-Mach pressure
residual:
  p_res += (bd/dt) * Q_T * B

Matches Nek5000 plan4.f convention (admcol3 with dtbd and bm1).

Q_T is still held at zero by the scheme, so numerics are unchanged.
Output remains bit-identical to pnpn. The signed contribution is now
active in the residual and ready to be exercised when Q_T is computed
from the energy equation in a subsequent commit.
Adds lowmach_update_Q_T which populates the thermal-divergence field as
  Q_T = div(k grad T) / (rho * cp * T)

The formula uses the energy equation without heat release
(rho cp D_t T = div(k grad T)) so the advective term cancels and Q_T
reduces to a pure spatial expression. The weak-form gradient, strong
conversion via gather-scatter + B^-1, multiplication by k, weak
divergence via cdtp, reassembly, and pointwise division by (rho cp T)
mirror the sequence in the earlier lowmach-attempt-1 branch (which had
the algebra correct).

Reads k_conductivity and cp from case.fluid.low_mach.{k_conductivity,cp}
with defaults of 1.0 each.

Called from step() immediately after the density update. When the
temperature field is not present in the registry (current tgv example),
the routine returns early and Q_T stays at zero, so output remains
bit-identical to pnpn. A test case with a scalar will be needed to
exercise the computed Q_T path.
Replaces the point-value reads (rho%x(1,1,1,1), mu%x(1,1,1,1)) with
pointwise field reads in both the pressure and velocity residuals. The
Helmholtz coefficients c_Xh%h1 and c_Xh%h2 now vary spatially, which is
the pre-requisite for variable-density / variable-viscosity low-Mach
physics.

When rho and mu are spatially uniform — as in the tgv example and
whenever low-Mach is disabled or the temperature field is unavailable —
the field and point-value paths give identical results, so output
remains bit-identical to pnpn.

This is the last pure constant-viscosity refinement. The next step is to
switch to the full-stress viscous formulation so the ∇·(2μS − (2/3)μ(∇·u)I)
tensor is evaluated correctly when μ varies across the domain.
Configures the TGV example with a scalar initialised to a spatially
uniform temperature T=1 and enables the low-Mach path with
P0=R_specific=k=cp=1. This exercises the full low-Mach pipeline: lazy
T_ptr resolution from the registry, density update via EOS, Q_T
computation, and residuals that consume rho and mu as spatial fields.

With T uniform, Q_T vanishes and rho stays at 1, so fluid residuals are
bit-identical to the standard pnpn run of the same configuration — the
strongest check that the infrastructure is wired correctly.

A follow-up case with a spatially varying T is the natural next step to
exercise the non-trivial Q_T path.
Adaptation of the existing Boussinesq Rayleigh-Benard example to drive
fluid_lowmach_pnpn instead of fluid_pnpn. Two adjustments:

  * Scalar temperature is shifted to the range [1, 2] (cold top at 1,
    hot bottom at 2, DeltaT = 1 preserved) so that the ideal-gas EOS
    rho = P0 / (R * T) used by the low-Mach scheme stays finite.
  * The Boussinesq buoyancy source in the user file is evaluated as
    rhs_w = Ra * Pr * (T - T_ref) with T_ref = 1, so the zero-buoyancy
    state at the cold wall is preserved.

low_mach is enabled with P0 = R_specific = k_conductivity = cp = 1
and the temperature_field pointing at the scalar "temperature". rho
now varies across the domain (roughly 1 at the cold wall to 0.5 at the
hot wall) and Q_T = div(k grad T) / (rho cp T) is non-zero wherever
there is a thermal gradient. This is the first configuration that
exercises the full low-Mach pipeline with a meaningful spatial
variation in temperature.

Short 50-step run converges cleanly: stable CFL ~ 0.07, pressure
residuals in the 1e-4 to 1e-6 range, velocity and scalar residuals
small. Full 100-time-unit run left to the user.
Previously the BDF/EXT/OIFS rhs_maker calls passed rho%x(1,1,1,1)
(a single point value) while the velocity Helmholtz LHS used the full
rho field. This made the momentum equation internally inconsistent
whenever rho varied spatially.

Introduces three local subroutines that mirror rhs_maker_ext_cpu,
rhs_maker_bdf_cpu and rhs_maker_oifs_cpu but take rho as a field and
weight each node by rho(x). step() now branches to these when
low_mach_enabled is true; the inherited pnpn rhs_makers are still used
otherwise.

Verification:
  * Disabled path: bit-identical to pnpn on the TGV smoke test.
  * Enabled path with uniform rho (T=1 uniform): bit-identical to pnpn,
    confirming the field-rho variant reduces to the scalar-rho one when
    rho is uniform.
  * Rayleigh-Benard case with variable rho: runs 50 steps to Normal end
    with residuals in the expected range. Numerics differ slightly from
    the previous run because the momentum equation is now self-consistent.
Developer guide and implementation notes for the low-Mach (variable-density,
ideal-gas EOS) fluid scheme on this branch.
@adigh23 adigh23 force-pushed the feature/lowmach branch from 3d49e0c to 017b543 Compare June 2, 2026 00:12
Comment thread LOWMACH_DEV_GUIDE.md
Comment on lines +43 to +45
**Energy** — solved by the Neko **scalar** named `temperature`:

ρ cp (∂T/∂t + u·∇T) = ∇·(λ∇T)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a design decision you want to think about. Whether the fluid type itself should handle the scalar solve or whether the user should setup the scalar for it. The latter is probably easier programming-wise1

Comment thread LOWMACH_DEV_GUIDE.md

| member | meaning |
|---|---|
| `low_mach_enabled` | master switch; if false, `step` runs vanilla pnpn |

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed outside the intial debugging phase.

Comment thread LOWMACH_DEV_GUIDE.md
Comment on lines +137 to +143
The decisive detail: `λ` and `cp` are the scalar's **spatially varying fields**,
resolved lazily from the registry as `"<temperature>_lambda_tot"` (fallback
`"_lambda"`) and `"_cp"` — i.e. Nek5000's `vdiff`/`vtrans` of the temperature
field. They are filled by the user `material_properties` hook (e.g. Sutherland
`λ(T)`). Using a **constant** `k`/`cp` instead (the original code) makes `Q_T`
inconsistent with the energy balance and was the cause of the heated-channel
blow-up. If the scalar fields are absent, it falls back to `k_cond`/`cp`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I fully understand this, I guess we can discuss!

Comment thread LOWMACH_DEV_GUIDE.md
Comment on lines +217 to +219
`lm_rhs_ext_field_rho`, `lm_rhs_bdf_field_rho`, `lm_rhs_oifs_field_rho` are
variable-ρ counterparts of `rhs_maker_*_cpu`: they assemble the AB-extrapolated
forcing history and the BDF lagged-velocity term weighting **per node by ρ(x)**

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need new types here. The rhs makers we have should just treat variable density properly. Density is always an array in the fluid anyway,

Comment thread LOWMACH_DEV_GUIDE.md
```jsonc
"fluid": {
"scheme": "lowmach",
"full_stress_formulation": true, // REQUIRED (coupled stress operator)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just force this in the code, since it is required, there is no reason to let the user configure this.

Comment thread LOWMACH_DEV_GUIDE.md
Comment on lines +244 to +246
## 10. Material properties / coupling (user side)

Variable `μ(T)`, `λ(T)`, `cp` are supplied through the Neko `material_properties`

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would maybe be good to provide some good default options, i.e. Sutherland's, so that one does not always need to code this in the user file. @tuananhdao Does the compressible solver use these kind of relations? I am thinking that maybe some code can be reused across the two schemes here.

Comment thread LOWMACH_DEV_GUIDE.md
Comment on lines +183 to +185
### The fix — defect correction
Solve the true variable-ρ system `A_var p = b` by an outer iteration whose inner
solve uses a **constant reference operator** `A_ref` (hsmg's home turf):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this stuff something you invented, or does this follow the Nek5000 formulation? My thinking is that since we are basically porting the Nek5000 scheme here, we could/should, in the first instance, just match exactly what Nek5000 does and achieve the same solution in both codes for some validation flow problem. That would be a good milestone!

Comment on lines +106 to +107
procedure, pass(this) :: init => fluid_lowmach_pnpn_init
procedure, pass(this) :: step => fluid_lowmach_pnpn_step

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are clearly missing a free routine because you have allocatables and pointers inside the type.

Comment on lines +241 to +244
do concurrent (i = 1:n)
this%Q_T_field%x(i,1,1,1) = this%Q_T_field%x(i,1,1,1) &
+ w1%x(i,1,1,1)
end do

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer calling field_math.f90 / math.f90 routines where possible, unless you are fusing a big kernel into one loop. This will make GPU porting easier later.

@njansson njansson Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the CPU path, I would rather say keep it as is. And in the near future replace do concurrent with do + directives + omp do (see #2335)

adigh23 added 4 commits June 26, 2026 18:30
…ting

Two structural bugs in the variable-density (low-Mach) path, found and verified
while building an MMS validation against Nek5000 short_tests/lowMach_test:

* Q_T thermal divergence (fluid_lowmach_pnpn.f90): the conduction term
  div(lambda grad T) was built with cdtp (the weak/transpose operator D^T),
  which does NOT compose with Binv into a strong divergence -- it produced
  O(1e2) oscillatory values even for an exactly-resolved T=2+x^2 (Laplacian=2).
  Replaced with the dudxyz collocation-derivative chain used by Neko's div().
  Q_T_field is now registered ('Q_T') so user hooks / field writers can read it,
  and an OPTIONAL volumetric heat source <temperature>_qdot is added to Q_T
  (matching Nek5000 userqtl_scig); off by default, so existing cases are
  bit-for-bit unchanged.

* energy equation (scalar_pnpn.f90): in the low_mach rho*cp path the user
  volumetric source q was bundled into the RHS with the advection and then the
  whole RHS multiplied by rho*cp(x)=1/T, so the solver enforced ...+rho*cp*q
  instead of +q, leaving a spurious (rho*cp-1)*q steady residual everywhere
  T!=1. The source (and weak-BC flux) is now snapshotted raw BEFORE the rho*cp
  advection weighting and re-added un-weighted, mirroring Nek5000's split of
  makeuq (raw source) from convab (vtrans-weighted advection). NB this affects
  every low-Mach case with a scalar source (heat release, species reactions).

Also carries the variable-density momentum/pressure residual operating on
spatial rho/mu fields (lowmach_res_cpu) and the field-valued rho RHS makers
(rhs_maker*, rhs_maker_cpu/sx/device), plus the laminar mu_tot refresh
(fluid_scheme_incompressible) and minor fluid_pnpn/scalar_scheme hooks.

Result: Neko reproduces and holds the lowMach_test manufactured solution to
discretization accuracy -- max err u 2.0e-5, T 3.0e-7, QTL 7.2e-3, flat over
2000 steps (t=2, dt=1e-3), Normal end, 0 non-convergences.
Method-of-manufactured-solutions validation of the low-Mach solver against
Nek5000 short_tests/lowMach_test (Tomboulides et al., JCP 146, 1998):
u = T = 0.5(3+tanh(x/0.2)), rho = 1/T, with the manufactured heat source q so
the thermal divergence Q_T = QTL = 0.5/d (1-tanh^2) holds exactly.

  - lowMach_validation.case / .f90: low-Mach + temperature scalar, analytic
    IC/BC (tanh), heat source q (via the <temperature>_qdot registry field),
    and a compute hook reporting MAX errors of u, T, Q_T vs the closed forms.
  - lowMach_nek.nmsh: Nek5000's own mesh (lowMach_test.re2 -> rea2nbin).
  - render_validation.py + viz_profiles.png / viz_contour.png: visual overlay
    of every Neko GLL point on the analytic curve.
  - README.md: physics, how to run, and the three bugs this case fixed.

Holds the manufactured solution to discretization accuracy (u 2e-5, T 3e-7,
QTL 7e-3) over 2000 steps. Generated outputs (.fld, logs, test meshes) are
gitignored.
…wall, gravity) + HPC kit

Flow over a wall-mounted hump (gmsh hump.geo -> gmsh2nek -> hump.nmsh, 53k hexes,
z-periodic), low-Mach, with:
  - SUSTAINED turbulent inflow via recycling (global_interpolation, recycle plane
    upstream of the hump) seeded by a turbulent IC + a small near-inlet trip;
  - lower wall HEATED for x>11 (smoothstep ramp to T_hot=2), cold upstream;
  - gravity / buoyancy (reduced-gravity -y body force from the EOS density);
  - GJP stabilization. All knobs at the top of heated_hump.f90.

Validated locally on a coarse mesh (recycling + low-Mach + heating + gravity run
to t=15, bounded, with hump separation and a developing heated thermal layer).
The full hump.nmsh + order 7 is HPC-scale (CPU-only low-Mach). Ships with
SHIP_TO_HPC.md, build_neko_hpc.sh, and a SLURM job.slurm template.
…notes

hump_fine.geo: NxA40+NxB140+NxC170 streamwise, Ny48, Nz60, rGrow 1.08 -> ~1.0M hexes
(vs 53,760). The .nmsh is ~240MB (gitignored; generate via gmsh2nek/rea2nbin or scp).
SHIP_TO_HPC.md documents order/resource sizing (use order 5 not 7 for 1M elems) and
the Reynolds number (Re_in on hump height; 500 stable-coarse, ~5000-10000 for the fine run).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants