Skip to content

fix(db): return data unchanged instead of NULL when no covariates found#4000

Open
anshul23102 wants to merge 3 commits into
PecanProject:developfrom
anshul23102:fix/covariate-functions-null-return
Open

fix(db): return data unchanged instead of NULL when no covariates found#4000
anshul23102 wants to merge 3 commits into
PecanProject:developfrom
anshul23102:fix/covariate-functions-null-return

Conversation

@anshul23102
Copy link
Copy Markdown
Contributor

Bug

arrhenius.scaling.traits() and filter_sunleaf_traits() both set data <- NULL when no matching covariates are found, contradicting their own documentation.

arrhenius.scaling.traits()

The docstring says:

Note that data with no matching covariates will be unchanged.

But the else-branch does:

} else {
  data <- NULL   # crashes the caller
}
return(data)

This function is called in query.trait.data() for Vcmax, root_respiration_rate, leaf_respiration_rate_m2, and stem_respiration_rate. When no temperature covariate is recorded in BETYdb for a species, data becomes NULL, and the very next line if (nrow(result) == 0) throws:

Error in nrow(result) : argument is of length zero

The entire trait query pipeline crashes silently. The missing.temp = 25 parameter was designed to handle exactly this case (assume 25°C → scaling is a no-op), but it is never reached.

filter_sunleaf_traits()

The same data <- NULL pattern: when no canopy_layer covariate exists, all observations should pass through unfiltered. Instead they are all discarded.

Fix

Remove data <- NULL from both else-branches so the functions fall through to return(data) unchanged, as documented.

Impact

Any PEcAn meta-analysis run that queries a temperature-sensitive trait for a species with no temperature covariate in BETYdb will crash at query.trait.data(). This is a common situation for less-studied species or older database entries.

@github-actions github-actions Bot added the base label May 18, 2026
arrhenius.scaling.traits() and filter_sunleaf_traits() both set
data <- NULL in their else-branch, contradicting the documented
behaviour ("data with no matching covariates will be unchanged").

When no temperature covariates exist for any observation, the correct
response is to return the input data unchanged (equivalent to assuming
all measurements were taken at missing.temp = 25 degC, making the
Arrhenius scaling a no-op). Returning NULL instead caused a hard crash
in query.trait.data() at nrow(NULL) whenever Vcmax, root_respiration_rate,
leaf_respiration_rate_m2, or stem_respiration_rate were queried for
species with no temperature covariate recorded in BETYdb.

filter_sunleaf_traits() has the identical bug: when no canopy_layer
covariate is available, all measurements should pass through unfiltered,
not be silently discarded.
@anshul23102 anshul23102 force-pushed the fix/covariate-functions-null-return branch from 6ed1fc0 to 823ed4f Compare May 18, 2026 12:34
@mdietze
Copy link
Copy Markdown
Member

mdietze commented May 18, 2026

@dlebauer since I think you originally wrote these bits many many years ago, I was looking for your feedback on which is wrong here, the code or the documentation (i.e., do we want to remove these traits if they lack covariates, or assume they correspond to defaults?). If the documentation is wrong, sounds like there are other fixed downstream that are needed

@dlebauer
Copy link
Copy Markdown
Member

@mdietze

When I implemented this, the default of 25C for missing measurement temperature seemed more defensible, given the dataset that we had.

But now, for a temperature-dependent trait with unknown measurement or reference temperature, I think it is safer to either filter those rows out or error rather than silently assume 25C; leave it to the user to explicitly make the decision to fill in missing temperature covariate data.

I think that the sunleaf case is different because the method of taking these measurements from top of canopy seems more standardized.

@anshul23102
Copy link
Copy Markdown
Contributor Author

Thanks for the context @dlebauer. That distinction makes sense.
For filter_sunleaf_traits, happy to keep the fix as-is (return data unchanged when no canopy_layer covariate found).
For arrhenius.scaling.traits, I agree silent assumption of 25°C is risky. Would you prefer:

  1. Warn + proceed - emit a logger.warn() when falling back to missing.temp, so the user sees it but the run doesn't crash
  2. Filter - drop rows that have no temperature covariate rather than assigning the default
  3. Error - stop with an informative message telling the user to provide temperature covariates

I can update the PR whichever way you'd like. The original crash (argument is of length zero in nrow()) still needs addressing regardless of which policy we pick.

@mdietze
Copy link
Copy Markdown
Member

mdietze commented May 20, 2026

I vote for 2 + warning: drop rows missing covariates, but send the user a warning letting them know that you've done so (e.g. "X rows of data on Y are dropped due to missing covariate date")

@dlebauer
Copy link
Copy Markdown
Member

I agree with @mdietze that there values should be dropped with warning. Thanks @anshul23102 !

@anshul23102
Copy link
Copy Markdown
Contributor Author

Thanks for the decision. Here's my plan before I update the code:

  • arrhenius.scaling.traits: drop rows with no temperature covariate and emit a logger.warn() with the count (e.g. "X rows dropped due to missing temperature covariate"). Returns an empty data frame instead of NULL, which also fixes the nrow() crash downstream.
  • filter_sunleaf_traits: keep returning data unchanged when no canopy_layer covariate is found, based on @dlebauer's note that the sunleaf measurement method is more standardized, so passing all observations through seems correct here.

Does this sound right to both of you, or should filter_sunleaf_traits follow the same drop + warn approach?

@dlebauer
Copy link
Copy Markdown
Member

Looks good to me except:

Returns an empty data frame instead of NULL, which also fixes the nrow() crash downstream.

Returns empty data frame iff no input rows have covariates, else returns with rows that had covariates.

And please make sure to update docs accordingly and add tests - these don't have to comprehensively cover all inputs, just the key functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants