feat: add ERA5 CDS ↔ CF variable name mapping utility#3922
feat: add ERA5 CDS ↔ CF variable name mapping utility#3922Akash-paluvai wants to merge 2 commits into
Conversation
dlebauer
left a comment
There was a problem hiding this comment.
@Akash-paluvai thank you for this contribution. It occurs to me that this would a good chance to make a general mapping function; see comments.
| paste(unknown, collapse = ", "), | ||
| "returning NA for those entries" | ||
| ) | ||
| warning(msg, call. = FALSE) |
There was a problem hiding this comment.
only the PEcAn.logger call is required, it will pass to the appropriate function (warning in this case). and it handles the outer paste internally (the inner one with collapse = may need to stay.
| #' included. Variables not listed here are not handled by this pipeline. | ||
| #' | ||
| #' @noRd | ||
| era5_cds_to_cf_varnames <- c( |
There was a problem hiding this comment.
Please let me know if this doesn't make sense, but it seems like it would be simpler and more generally useful to
- add a new col
era5_cdsto https://github.com/PecanProject/pecan/blob/develop/modules/data.atmosphere/R/pecan_standard_met_table.R and - create a generic lookup function (see below)
- then update the code as needed?
translate_met_varnames <- function(vars, from, to, table = pecan_standard_met_table) {
lookup <- table |>
dplyr::select(dplyr::all_of(c(from, to))) |>
dplyr::filter(!is.na(.data[[from]]), .data[[from]] != "")
result <- lookup[[to]]
names(result) <- lookup[[from]]
result[vars]
}then if it is useful (not clear if it is, but simple enough), developers can create <from>_to_<to>_varnames. functions as needed.
…table-based era5_cds column
Thanks @dlebauer — I’ve updated the PR based on your suggestions. Moving this into Changes made
Notes
|
|
I’m happy to keep it internal instead if you’d prefer to limit the public API surface for now. |
| "northward_wind" , "m s-1" , TRUE, "northward_wind" , NA , NA , NA , "CALC(WS+WD)" , "v10" , | ||
| "volume_fraction_of_condensed_water_in_soil" , "1" , FALSE, "soilM" , NA , NA , NA , "SWC_1" , "swvl1" | ||
| ) | ||
| ~`cf_standard_name` , ~units , ~is_required, ~bety , ~isimip , ~cruncep , ~narr , ~ameriflux , ~era5 , ~era5_cds, |
There was a problem hiding this comment.
, ~era5 , ~era5_cds,
What's the difference between the existing ERA column and the new ERA5_cds?
There was a problem hiding this comment.
The era5 and era5_cds columns represent two different stages of the ERA5 workflow and are not interchangeable.
era5: Short variable names (e.g., swvl1, ssrd) as stored in downloaded NetCDF files. Used byextract.nc.ERA5()to read variables from disk.era5_cds: Full variable names required by the Copernicus CDS API (e.g., volumetric_soil_water_layer_1). Used bydownload.ERA5_cds()via thevariables=argument when requesting data.
The CDS API requires the long-form names, while NetCDF files store variables using short names, so both are needed.
Most entries in era5_cds are intentionally NA. Only variables explicitly requested by the fallback pipeline via download.ERA5_cds() require a value.
Variables such as t2m or strd have era5 short names because they can be extracted from existing files, but they are not directly requested by the fallback pipeline. Their CDS names are therefore omitted to avoid implying otherwise.
There was a problem hiding this comment.
I really do not think we want to be adding a separate column to this table for every different source of the same dataset.
There was a problem hiding this comment.
@infotroph how do you suggest handling different versions of the same dataset that use different naming conventions? What are the advantages and disadvantages of each approach? More generally, what is your suggested fix for this bug?
There was a problem hiding this comment.
Given we have 6 columns in this lookup table and 11 met2CF functions, it seems clear the de facto system is already for each function to handle its own name conversions -- and any unit conversions, which this table doesn't cover -- for itself.
Having all the PEcAn standard output names and units in a lookup table is definitely useful. I just don't think it's useful to make every input be listed here when all the logic for it already needs to be present in the relevant met2CF function.
|
@infotroph @dlebauer based on your feedback, here is the direction I think makes sense. pls correct me if I'm wrong. The root issue is that The CDS API long-form name (e.g., fix
Is this the right direction? This aligns with @infotroph's point that each function already handles its own name conversions the CDS translation lives in prepare_era5_fallback_cf the same way existing met2CF functions handle their own input conversions. |
Description
This PR introduces an internal ERA5 CDS to CF variable name translation utility and updates the AmeriFlux coverage workflow to use it.
Previously, CDS variable names (used for ERA5 download) and CF standard names (used in NetCDF files) were not aligned. This mismatch caused silent failures in downstream coalescing, where missing values were not filled even when ERA5 fallback was required.
This PR adds an explicit mapping layer and ensures both naming conventions are available and correctly used within the workflow.
Motivation and Context
Fixes silent failure in ERA5 fallback preparation (see #3605).
Previously:
This PR ensures:
Review Time Estimate
Types of changes
Checklist: