-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Summary
Provide utilities for handling missing forecasts before summarising scores, so that downstream analyses (pairwise comparisons, model confidence sets, ensemble importance) operate on a consistent and well-defined set of scores.
Motivation
When models submit forecasts for different subsets of targets or dates, summarisation and comparison functions can produce misleading results.
For example, a model that only forecasts easy targets will appear to perform better than one that forecasts everything.
Several downstream features depend on principled handling of missingness:
get_pairwise_comparisons()already restricts to overlapping forecasts per pair, but this is pairwise rather than global- Model confidence sets (Model confidence sets #1055) need a consistent comparison set across all models
- Ensemble importance (Model importance metrics: LOMO, Shapley values, and PAR for ensemble contribution #1121) needs complete forecast sets for ensemble construction
Kim et al.'s modelimportance package handles this in model_importance_summary() with three strategies: drop, worst-case imputation, and average imputation.
These are general-purpose strategies that belong in scoringutils rather than being reimplemented in each downstream feature in my opinion.
Possible approaches
- Filter to common targets: restrict to the intersection of targets covered by all (or a minimum number of) models. Simple and transparent.
- Impute missing scores: fill in missing scores with worst-case, average, or custom values. More flexible but requires assumptions.
- Flag and report: identify and report missingness without modifying data, leaving the choice to the user.
These are not mutually exclusive. A utility that detects and reports missingness, plus helpers for filtering or imputing, would cover most use cases.
Connection to existing functionality
get_pairwise_comparisons()handles missingness per pair via overlapping forecast setssummarise_scores()silently summarises whatever is presentvalidate_forecast()checks structure but not completeness
See also #1055 (model confidence sets) and #1121 (ensemble importance).
This was opened by a bot. Please ping @seabbs for any questions.