Skip to content

Mask SCRIP files to reduce memory footprint of ESMF_RegridWeightGen#955

Open
trhille wants to merge 8 commits intoMPAS-Dev:mainfrom
trhille:landice/mask_scrip_for_interp
Open

Mask SCRIP files to reduce memory footprint of ESMF_RegridWeightGen#955
trhille wants to merge 8 commits intoMPAS-Dev:mainfrom
trhille:landice/mask_scrip_for_interp

Conversation

@trhille
Copy link
Copy Markdown
Collaborator

@trhille trhille commented May 5, 2026

This merge creates a grid_imask field in the SCRIP files for source data sets (BedMachine and MEaSUREs) that masks out cells that do not overlap the target MALI mesh (plus a buffer with default 50km width). This reduces the active cells in BedMachine Greenland v6 by about 25%, from 1.8M to 1.4M, and makes it possible to run the default 1–10km greenland mesh_gen case on 8 Perlmutter CPU nodes in about 40 minutes, where it previously required 16 nodes.

Checklist

  • User's Guide has been updated
  • Developer's Guide has been updated
  • API documentation in the Developer's Guide (api.rst) has any new or modified class, method and/or functions listed
  • Documentation has been built locally and changes look as expected
  • Document (in a comment titled Testing in this PR) any testing that was used to verify the changes

trhille added 4 commits May 4, 2026 14:41
Mask scrip file to only use cells overlapping with MALI mesh
before ESMF_RegridWeightGen. This can lead to a considerable decrease
in computational cost when interpolating to large meshes.
Change default nProcs from 2048 to 1024 for 1-10km greenland mesh,
which is now possible thanks to masked scrip file that reduces
the memory footprint of ESMF_RegridWeightGen.
Add comment detailing potential optimization, in which the convex
hull of the MALI mesh is cached and reused by multiple source
datasets when calculating masks before ESMF_RegridWeightGen.
@trhille trhille marked this pull request as draft May 5, 2026 03:18
Reuse convex hull of MALI mesh to avoid redundant computation when
interpolating from multiple source data sets (e.g., BedMachine
and MEaSUREs).
@trhille
Copy link
Copy Markdown
Collaborator Author

trhille commented May 5, 2026

Testing

With these changes, we can create the 1–10km Greenland mesh in 37:21 and the 4–20km Antarctica mesh in 27:24 on 8 CPU nodes on Perlmutter (1024 tasks). The MALI mesh convex hull caching introduced in 9064a8c saves a few minutes: the 1–10km Greenland mesh took 40:09 when calculating the convex hull multiple times, but I only ran each case once. Antarctica can fit on 4 nodes (40:39 run time), but Greenland fails on 4 nodes because ESMF_RegridWeightGen runs out of memory. I haven't tested Greenland with more than 4 but less than 8 nodes.

For Greenland, these changes reduce the number of activate cells in the source SCRIP by about 25%: active source cells after masking: 140813110 / 187459428
For Antarctica, the savings is much larger, reducing the number of active cells in the source SCRIP by ~60%: active source cells after masking: 70964025 / 177768889

@trhille
Copy link
Copy Markdown
Collaborator Author

trhille commented May 5, 2026

Here's an example of the saving on active cells in the SCRIP file from an 8–30km Antarctica mesh (which took 22:17 on 4 Perlmutter nodes):
image

trhille added 2 commits May 5, 2026 10:10
Plot the convex hull fo the MALI mesh as well as the active cells
in the SCRIP file after masking and the source data set bounds.
Save plot to png.
Remove scatterplot of source grid cells, which is redundant with the
convex hull contour.
@trhille
Copy link
Copy Markdown
Collaborator Author

trhille commented May 5, 2026

For Greenland (3–30km), here's the convex hull approach (top) from 8c1dc31, versus the dilation approach (i.e., buffered boundary) approach (bottom) from 69d8b8e:
image
image

Use rasterize-dilate-contour approach rather than convex hull to mask
SCRIP files for source data sets. This results in a much tighter-fitting
mask around the MALI domain and masks out many more cells from the source
files, which will significantly further decrease the memory footprint
of ESMF_RegridWeightGen.
@trhille
Copy link
Copy Markdown
Collaborator Author

trhille commented May 5, 2026

With the changes in 69d8b8e I get active source cells after masking: 62723040 / 177768889 (compared to 70964025 / 177768889 when using the convex hull approach) for the 4–20km Antarctic mesh, which took 42:24 on 4 Perlmutter CPU nodes:
image

@trhille trhille marked this pull request as ready for review May 5, 2026 19:56
Copy link
Copy Markdown
Member

@matthewhoffman matthewhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trhille , this looks like a really nice performance optimization. I read all the test results and skimmed all the changes. The general idea and the specific implementation make sense. I have requested some changes to avoid code duplication and simplify organization. I may be missing reasons why things are the way they are, though, so if these suggestions aren't practical, feel free to say so.

Comment thread compass/landice/mesh.py
Comment on lines +1372 to +1383
projections = {
'greenland': (
'+proj=stere +lat_ts=70.0 +lat_0=90 +lon_0=315.0 +k_0=1.0 '
'+x_0=0.0 +y_0=0.0 +ellps=WGS84'
),
'antarctica': (
'+proj=stere +lat_ts=-71.0 +lat_0=-90 +lon_0=0.0 +k_0=1.0 '
'+x_0=0.0 +y_0=0.0 +ellps=WGS84'
),
}
projections['gis-gimp'] = projections['greenland']
projections['ais-bedmap2'] = projections['antarctica']
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated with function above. Let's have a single location for these in a common data structure. I could imagine them being used in other contexts. (I know we have these already defined in MPAS-Tools, but we don't need to worry about eliminating that redundancy.)

Comment thread compass/landice/mesh.py
Comment on lines +1395 to +1400
def _maybe_deg(lon, lat):
if (np.nanmax(np.abs(lon)) <= 2.0 * np.pi + 1.0e-6 and
np.nanmax(np.abs(lat)) <= 0.5 * np.pi + 1.0e-6):
lon = np.rad2deg(lon)
lat = np.rad2deg(lat)
return lon, lat
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also duplicate and could be moved to a common location.

Comment thread compass/landice/mesh.py
transformer : pyproj.Transformer
Transformer from lon/lat to planar coordinates.

_maybe_deg : callable
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why a function would be passed here. As per comment above, if _maybe_deg is moved to a single common function, it could just be called directly and there would be no need to pass it in here.

Comment thread compass/landice/mesh.py
Comment on lines +1523 to +1524
# active source cells — ice-sheet extent (tab:blue)
xs, ys = _subsample(active_xc, active_yc)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these lines are used - xs,ys get replaced on the next line.

Comment thread compass/landice/mesh.py
Comment on lines +1967 to +1978
projections = {
'greenland': (
'+proj=stere +lat_ts=70.0 +lat_0=90 +lon_0=315.0'
' +k_0=1.0 +x_0=0.0 +y_0=0.0 +ellps=WGS84'
),
'antarctica': (
'+proj=stere +lat_ts=-71.0 +lat_0=-90 +lon_0=0.0'
' +k_0=1.0 +x_0=0.0 +y_0=0.0 +ellps=WGS84'
),
}
projections['gis-gimp'] = projections['greenland']
projections['ais-bedmap2'] = projections['antarctica']
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another instance that can be eliminated

Comment thread compass/landice/mesh.py
transformer = Transformer.from_crs(
'EPSG:4326', mesh_crs, always_xy=True)

def _maybe_deg_plot(lon, lat):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

Comment thread compass/landice/mesh.py
Comment on lines +1997 to +2004
# Try to find the masked SCRIP that was written to workdir
import re as _re
match = _re.search(r'(^.*[_-]v\d*[_-])+', bm_stem)
if match:
scrip_stem = bm_stem[:match.end() - 1]
else:
scrip_stem = bm_stem
masked_scrip = f'{scrip_stem}.scrip_masked.nc'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a cleaner way to do this?

Comment thread compass/landice/mesh.py
Comment on lines +1964 to +1965
# Diagnostic plot: show hull, MALI domain, and bounding boxes for
# all source datasets that were interpolated.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move the rest of these changes into the plotting function? I had the sense run_optional_interpolation was meant to be a high level driver calling functions for chunks of work, but the remainder of the additions below this point are in the weeds of setting up the plot. And given the plotting function is only called in one place, conceptually it would all these details could be pushed into that function instead of living here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants