Skip to content

feat: add ERA5 CDS ↔ CF variable name mapping utility#3922

Open
Akash-paluvai wants to merge 2 commits into
PecanProject:developfrom
Akash-paluvai:GH-3605-era5-cf-varname-map
Open

feat: add ERA5 CDS ↔ CF variable name mapping utility#3922
Akash-paluvai wants to merge 2 commits into
PecanProject:developfrom
Akash-paluvai:GH-3605-era5-cf-varname-map

Conversation

@Akash-paluvai
Copy link
Copy Markdown
Contributor

Description

This PR introduces an internal ERA5 CDS to CF variable name translation utility and updates the AmeriFlux coverage workflow to use it.

Previously, CDS variable names (used for ERA5 download) and CF standard names (used in NetCDF files) were not aligned. This mismatch caused silent failures in downstream coalescing, where missing values were not filled even when ERA5 fallback was required.

This PR adds an explicit mapping layer and ensures both naming conventions are available and correctly used within the workflow.


Motivation and Context

Fixes silent failure in ERA5 fallback preparation (see #3605).

Previously:

  • Coverage checks returned only CDS variable names
  • Downstream coalescing expects CF standard names
  • Result: variables were not matched correctly and no filling occurred

This PR ensures:

  • CDS names are used for ERA5 data download
  • CF names are used for NetCDF variable matching during coalescing
  • Both naming conventions are consistently propagated through the pipeline

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Copy link
Copy Markdown
Member

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Akash-paluvai thank you for this contribution. It occurs to me that this would a good chance to make a general mapping function; see comments.

paste(unknown, collapse = ", "),
"returning NA for those entries"
)
warning(msg, call. = FALSE)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only the PEcAn.logger call is required, it will pass to the appropriate function (warning in this case). and it handles the outer paste internally (the inner one with collapse = may need to stay.

#' included. Variables not listed here are not handled by this pipeline.
#'
#' @noRd
era5_cds_to_cf_varnames <- c(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if this doesn't make sense, but it seems like it would be simpler and more generally useful to

translate_met_varnames <- function(vars, from, to, table = pecan_standard_met_table) {
  lookup <- table |>
    dplyr::select(dplyr::all_of(c(from, to))) |>
    dplyr::filter(!is.na(.data[[from]]), .data[[from]] != "")

  result <- lookup[[to]]
  names(result) <- lookup[[from]]

  result[vars]
}

then if it is useful (not clear if it is, but simple enough), developers can create <from>_to_<to>_varnames. functions as needed.

@Akash-paluvai
Copy link
Copy Markdown
Contributor Author

@Akash-paluvai thank you for this contribution. It occurs to me that this would a good chance to make a general mapping function; see comments.

Thanks @dlebauer — I’ve updated the PR based on your suggestions. Moving this into pecan_standard_met_table makes the design much cleaner.

Changes made

  • Added an era5_cds column to pecan_standard_met_table for long-form CDS API variable names (distinct from the existing era5 GRIB short names).
  • Implemented a generic translate_met_varnames(vars, from, to, table) utility for translating between naming conventions using the table as the source of truth.
  • Rewrote cds_to_cf_varnames() and cf_to_cds_varnames() as thin wrappers over this function.
  • Removed the standalone mapping file (era5_cf_varname_map.R) to avoid duplication.
  • Updated check_met_coverage_for_fallback() to return both fill_vars_cds and fill_vars_cf.
  • Adjusted tests accordingly to validate translation behaviour and return structure.

Notes

  • Only the variables currently used in the fallback pipeline have non-NA values in era5_cds; others remain NA until verified.
  • Warning behaviour is handled via PEcAn.logger::logger.warn(), and tests now validate correct NA handling for unknown inputs.

@Akash-paluvai
Copy link
Copy Markdown
Contributor Author

translate_met_varnames() is currently exported as a generic utility for translating between naming conventions using pecan_standard_met_table.

I’m happy to keep it internal instead if you’d prefer to limit the public API surface for now.

@Akash-paluvai Akash-paluvai requested a review from dlebauer April 20, 2026 09:10
@Akash-paluvai
Copy link
Copy Markdown
Contributor Author

@dlebauer I’ve made the changes. If you’re free, please have a look. If everything looks good, I’ll proceed to the next step to fix #3605.

"northward_wind" , "m s-1" , TRUE, "northward_wind" , NA , NA , NA , "CALC(WS+WD)" , "v10" ,
"volume_fraction_of_condensed_water_in_soil" , "1" , FALSE, "soilM" , NA , NA , NA , "SWC_1" , "swvl1"
)
~`cf_standard_name` , ~units , ~is_required, ~bety , ~isimip , ~cruncep , ~narr , ~ameriflux , ~era5 , ~era5_cds,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, ~era5 , ~era5_cds,

What's the difference between the existing ERA column and the new ERA5_cds?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The era5 and era5_cds columns represent two different stages of the ERA5 workflow and are not interchangeable.

  • era5: Short variable names (e.g., swvl1, ssrd) as stored in downloaded NetCDF files. Used by extract.nc.ERA5() to read variables from disk.
  • era5_cds: Full variable names required by the Copernicus CDS API (e.g., volumetric_soil_water_layer_1). Used by download.ERA5_cds() via the variables= argument when requesting data.

The CDS API requires the long-form names, while NetCDF files store variables using short names, so both are needed.

Most entries in era5_cds are intentionally NA. Only variables explicitly requested by the fallback pipeline via download.ERA5_cds() require a value.

Variables such as t2m or strd have era5 short names because they can be extracted from existing files, but they are not directly requested by the fallback pipeline. Their CDS names are therefore omitted to avoid implying otherwise.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do not think we want to be adding a separate column to this table for every different source of the same dataset.

Copy link
Copy Markdown
Member

@dlebauer dlebauer Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infotroph how do you suggest handling different versions of the same dataset that use different naming conventions? What are the advantages and disadvantages of each approach? More generally, what is your suggested fix for this bug?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we have 6 columns in this lookup table and 11 met2CF functions, it seems clear the de facto system is already for each function to handle its own name conversions -- and any unit conversions, which this table doesn't cover -- for itself.

Having all the PEcAn standard output names and units in a lookup table is definitely useful. I just don't think it's useful to make every input be listed here when all the logic for it already needs to be present in the relevant met2CF function.

@Akash-paluvai
Copy link
Copy Markdown
Contributor Author

@infotroph @dlebauer based on your feedback, here is the direction I think makes sense. pls correct me if I'm wrong.

The root issue is that check_met_coverage_for_fallback was returning CDS API names when it should have been returning CF names all along. CF is the internal currency of PEcAn. Fixing that removes the need for the era5_cds column entirely.

The CDS API long-form name (e.g., surface_solar_radiation_downwards) is not a property of the met variable it is a request-layer detail specific to one HTTP endpoint. It does not belong in pecan_standard_met_table alongside CF names, BETY names, and model-specific names that describe the variable itself.

fix

  • check_met_coverage_for_fallback returns CF names in fill_vars directly no dual keys, no translation at this layer
  • prepare_era5_fallback_cf accepts CF names and holds a 2-entry named vector scoped to that file alone, translating CF → CDS long-form names at the one boundary that actually calls download.ERA5_cds()
  • Drop the era5_cds column from pecan_standard_met_table
  • translate_met_varnames survives untouched as a genuinely generic utility for column pairs that do belong in the table

Is this the right direction?

This aligns with @infotroph's point that each function already handles its own name conversions the CDS translation lives in prepare_era5_fallback_cf the same way existing met2CF functions handle their own input conversions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants