Mask SCRIP files to reduce memory footprint of ESMF_RegridWeightGen by trhille · Pull Request #955 · MPAS-Dev/compass

trhille · 2026-05-05T03:17:30Z

This merge creates a grid_imask field in the SCRIP files for source data sets (BedMachine and MEaSUREs) that masks out cells that do not overlap the target MALI mesh (plus a buffer with default 50km width). This reduces the active cells in BedMachine Greenland v6 by about 25%, from 1.8M to 1.4M, and makes it possible to run the default 1–10km greenland mesh_gen case on 8 Perlmutter CPU nodes in about 40 minutes, where it previously required 16 nodes.

Checklist

User's Guide has been updated
Developer's Guide has been updated
API documentation in the Developer's Guide (api.rst) has any new or modified class, method and/or functions listed
Documentation has been built locally and changes look as expected
Document (in a comment titled Testing in this PR) any testing that was used to verify the changes

Mask scrip file to only use cells overlapping with MALI mesh before ESMF_RegridWeightGen. This can lead to a considerable decrease in computational cost when interpolating to large meshes.

Change default nProcs from 2048 to 1024 for 1-10km greenland mesh, which is now possible thanks to masked scrip file that reduces the memory footprint of ESMF_RegridWeightGen.

Add comment detailing potential optimization, in which the convex hull of the MALI mesh is cached and reused by multiple source datasets when calculating masks before ESMF_RegridWeightGen.

Reuse convex hull of MALI mesh to avoid redundant computation when interpolating from multiple source data sets (e.g., BedMachine and MEaSUREs).

trhille · 2026-05-05T15:06:38Z

Testing

With these changes, we can create the 1–10km Greenland mesh in 37:21 and the 4–20km Antarctica mesh in 27:24 on 8 CPU nodes on Perlmutter (1024 tasks). The MALI mesh convex hull caching introduced in 9064a8c saves a few minutes: the 1–10km Greenland mesh took 40:09 when calculating the convex hull multiple times, but I only ran each case once. Antarctica can fit on 4 nodes (40:39 run time), but Greenland fails on 4 nodes because ESMF_RegridWeightGen runs out of memory. I haven't tested Greenland with more than 4 but less than 8 nodes.

For Greenland, these changes reduce the number of activate cells in the source SCRIP by about 25%: active source cells after masking: 140813110 / 187459428
For Antarctica, the savings is much larger, reducing the number of active cells in the source SCRIP by ~60%: active source cells after masking: 70964025 / 177768889

trhille · 2026-05-05T17:04:35Z

Here's an example of the saving on active cells in the SCRIP file from an 8–30km Antarctica mesh (which took 22:17 on 4 Perlmutter nodes):

Plot the convex hull fo the MALI mesh as well as the active cells in the SCRIP file after masking and the source data set bounds. Save plot to png.

Remove scatterplot of source grid cells, which is redundant with the convex hull contour.

trhille · 2026-05-05T18:00:52Z

For Greenland (3–30km), here's the convex hull approach (top) from 8c1dc31, versus the dilation approach (i.e., buffered boundary) approach (bottom) from 69d8b8e:

Use rasterize-dilate-contour approach rather than convex hull to mask SCRIP files for source data sets. This results in a much tighter-fitting mask around the MALI domain and masks out many more cells from the source files, which will significantly further decrease the memory footprint of ESMF_RegridWeightGen.

trhille · 2026-05-05T19:55:10Z

With the changes in 69d8b8e I get active source cells after masking: 62723040 / 177768889 (compared to 70964025 / 177768889 when using the convex hull approach) for the 4–20km Antarctic mesh, which took 42:24 on 4 Perlmutter CPU nodes:

matthewhoffman

@trhille , this looks like a really nice performance optimization. I read all the test results and skimmed all the changes. The general idea and the specific implementation make sense. I have requested some changes to avoid code duplication and simplify organization. I may be missing reasons why things are the way they are, though, so if these suggestions aren't practical, feel free to say so.

matthewhoffman · 2026-05-06T03:01:58Z

+    projections = {
+        'greenland': (
+            '+proj=stere +lat_ts=70.0 +lat_0=90 +lon_0=315.0 +k_0=1.0 '
+            '+x_0=0.0 +y_0=0.0 +ellps=WGS84'
+        ),
+        'antarctica': (
+            '+proj=stere +lat_ts=-71.0 +lat_0=-90 +lon_0=0.0 +k_0=1.0 '
+            '+x_0=0.0 +y_0=0.0 +ellps=WGS84'
+        ),
+    }
+    projections['gis-gimp'] = projections['greenland']
+    projections['ais-bedmap2'] = projections['antarctica']


This is repeated with function above. Let's have a single location for these in a common data structure. I could imagine them being used in other contexts. (I know we have these already defined in MPAS-Tools, but we don't need to worry about eliminating that redundancy.)

matthewhoffman · 2026-05-06T03:02:35Z

+    def _maybe_deg(lon, lat):
+        if (np.nanmax(np.abs(lon)) <= 2.0 * np.pi + 1.0e-6 and
+                np.nanmax(np.abs(lat)) <= 0.5 * np.pi + 1.0e-6):
+            lon = np.rad2deg(lon)
+            lat = np.rad2deg(lat)
+        return lon, lat


This is also duplicate and could be moved to a common location.

matthewhoffman · 2026-05-06T03:05:57Z

+    transformer : pyproj.Transformer
+        Transformer from lon/lat to planar coordinates.
+
+    _maybe_deg : callable


I don't understand why a function would be passed here. As per comment above, if _maybe_deg is moved to a single common function, it could just be called directly and there would be no need to pass it in here.

matthewhoffman · 2026-05-06T03:08:36Z

+        # active source cells — ice-sheet extent (tab:blue)
+        xs, ys = _subsample(active_xc, active_yc)


I don't think these lines are used - xs,ys get replaced on the next line.

matthewhoffman · 2026-05-06T03:10:50Z

+            projections = {
+                'greenland': (
+                    '+proj=stere +lat_ts=70.0 +lat_0=90 +lon_0=315.0'
+                    ' +k_0=1.0 +x_0=0.0 +y_0=0.0 +ellps=WGS84'
+                ),
+                'antarctica': (
+                    '+proj=stere +lat_ts=-71.0 +lat_0=-90 +lon_0=0.0'
+                    ' +k_0=1.0 +x_0=0.0 +y_0=0.0 +ellps=WGS84'
+                ),
+            }
+            projections['gis-gimp'] = projections['greenland']
+            projections['ais-bedmap2'] = projections['antarctica']


another instance that can be eliminated

matthewhoffman · 2026-05-06T03:11:02Z

+            transformer = Transformer.from_crs(
+                'EPSG:4326', mesh_crs, always_xy=True)
+
+            def _maybe_deg_plot(lon, lat):


matthewhoffman · 2026-05-06T03:11:48Z

+            # Try to find the masked SCRIP that was written to workdir
+            import re as _re
+            match = _re.search(r'(^.*[_-]v\d*[_-])+', bm_stem)
+            if match:
+                scrip_stem = bm_stem[:match.end() - 1]
+            else:
+                scrip_stem = bm_stem
+            masked_scrip = f'{scrip_stem}.scrip_masked.nc'


Is there a cleaner way to do this?

matthewhoffman · 2026-05-06T03:14:35Z

+        # Diagnostic plot: show hull, MALI domain, and bounding boxes for
+        # all source datasets that were interpolated.


Would it make sense to move the rest of these changes into the plotting function? I had the sense run_optional_interpolation was meant to be a high level driver calling functions for chunks of work, but the remainder of the additions below this point are in the weeds of setting up the plot. And given the plotting function is only called in one place, conceptually it would all these details could be pushed into that function instead of living here.

trhille added 4 commits May 4, 2026 14:41

Mask scrip file before ESMF_RegridWeightGen

2bafef9

Mask scrip file to only use cells overlapping with MALI mesh before ESMF_RegridWeightGen. This can lead to a considerable decrease in computational cost when interpolating to large meshes.

Update docs to describe masking SCRIP file before ESMF_RegridWeightGen

6051374

Change default nProcs from 2048 to 1024 for 1-10km greenland mesh

011e292

Change default nProcs from 2048 to 1024 for 1-10km greenland mesh, which is now possible thanks to masked scrip file that reduces the memory footprint of ESMF_RegridWeightGen.

Add comment detailing potential optimization

b04bc64

Add comment detailing potential optimization, in which the convex hull of the MALI mesh is cached and reused by multiple source datasets when calculating masks before ESMF_RegridWeightGen.

trhille marked this pull request as draft May 5, 2026 03:18

Reuse convex hull of MALI mesh for multiple source data sets.

9064a8c

Reuse convex hull of MALI mesh to avoid redundant computation when interpolating from multiple source data sets (e.g., BedMachine and MEaSUREs).

trhille added 2 commits May 5, 2026 10:10

Plot convex hull of MALI mesh

05d6850

Plot the convex hull fo the MALI mesh as well as the active cells in the SCRIP file after masking and the source data set bounds. Save plot to png.

Remove scatterplot of source grid cells

8c1dc31

Remove scatterplot of source grid cells, which is redundant with the convex hull contour.

trhille marked this pull request as ready for review May 5, 2026 19:56

trhille requested review from andrewdnolan and matthewhoffman May 5, 2026 19:56

matthewhoffman requested changes May 6, 2026

View reviewed changes

		# active source cells — ice-sheet extent (tab:blue)
		xs, ys = _subsample(active_xc, active_yc)

		# Diagnostic plot: show hull, MALI domain, and bounding boxes for
		# all source datasets that were interpolated.

Conversation

trhille commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trhille commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

trhille commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trhille commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trhille commented May 5, 2026

Uh oh!

matthewhoffman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trhille commented May 5, 2026 •

edited

Loading

trhille commented May 5, 2026 •

edited

Loading

trhille commented May 5, 2026 •

edited

Loading

trhille commented May 5, 2026 •

edited

Loading