-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Background
When building the calibration matrix, not all target groups should go to the optimizer. Some are redundant after hierarchical uprating (e.g., state-level SNAP Household Count is redundant when district-level SNAP Household Count targets have already been reconciled to sum to state totals). Others may need to be temporarily included or excluded as the modeller experiments.
Currently (PR #531), group exclusion is done in the diagnostic notebook via drop_target_groups(), which takes human-readable (label_substring, geo_level) tuples:
GROUPS_TO_DROP = [
("SNAP Household Count", "State"),
]This works for interactive use but has limitations:
- Not portable between notebooks and the CLI pipeline (
unified_calibration.py) - No way to persist a selection config across runs
- No validation that the specified groups actually exist in the current database
Desired behavior
A target selection config that:
- Is declarative — a YAML/JSON file or Python dict specifying which groups to include or exclude
- Uses semantic identifiers — variable names and geo levels, not numeric group IDs (which change when targets are added/removed)
- Works in both notebook and CLI contexts —
unified_calibration.py --target-config config.yaml - Echoes clearly — always prints what was kept and dropped so the modeller can audit
- Supports both include and exclude modes — sometimes it's easier to list what you want than what you don't
Possible format
# Exclude mode: start with everything, drop these
exclude:
- variable: "SNAP Household Count"
geo_level: "State"
reason: "Redundant after hierarchical uprating of district targets"
# OR include mode: only use these
# include:
# - variable: "snap"
# geo_level: "State"
# - domain: "aca_ptc"
# geo_level: "District"Non-goals for this issue
- This is NOT about changing
build_matrixor_query_targets— those control which targets are computed - This is purely about the filter step between matrix construction and optimization
- The current
drop_target_groups()function in the notebook is adequate for short-term use
Current workaround
The drop_target_groups() function in docs/calibration_matrix.ipynb (from PR #531) matches groups by label substring + geo level name. It's readable and explicit but notebook-local.
Ref: #531