Skip to content

Target selection config for calibration optimizer input #533

@baogorek

Description

@baogorek

Background

When building the calibration matrix, not all target groups should go to the optimizer. Some are redundant after hierarchical uprating (e.g., state-level SNAP Household Count is redundant when district-level SNAP Household Count targets have already been reconciled to sum to state totals). Others may need to be temporarily included or excluded as the modeller experiments.

Currently (PR #531), group exclusion is done in the diagnostic notebook via drop_target_groups(), which takes human-readable (label_substring, geo_level) tuples:

GROUPS_TO_DROP = [
    ("SNAP Household Count", "State"),
]

This works for interactive use but has limitations:

  • Not portable between notebooks and the CLI pipeline (unified_calibration.py)
  • No way to persist a selection config across runs
  • No validation that the specified groups actually exist in the current database

Desired behavior

A target selection config that:

  1. Is declarative — a YAML/JSON file or Python dict specifying which groups to include or exclude
  2. Uses semantic identifiers — variable names and geo levels, not numeric group IDs (which change when targets are added/removed)
  3. Works in both notebook and CLI contextsunified_calibration.py --target-config config.yaml
  4. Echoes clearly — always prints what was kept and dropped so the modeller can audit
  5. Supports both include and exclude modes — sometimes it's easier to list what you want than what you don't

Possible format

# Exclude mode: start with everything, drop these
exclude:
  - variable: "SNAP Household Count"
    geo_level: "State"
    reason: "Redundant after hierarchical uprating of district targets"

# OR include mode: only use these
# include:
#   - variable: "snap"
#     geo_level: "State"
#   - domain: "aca_ptc"
#     geo_level: "District"

Non-goals for this issue

  • This is NOT about changing build_matrix or _query_targets — those control which targets are computed
  • This is purely about the filter step between matrix construction and optimization
  • The current drop_target_groups() function in the notebook is adequate for short-term use

Current workaround

The drop_target_groups() function in docs/calibration_matrix.ipynb (from PR #531) matches groups by label substring + geo level name. It's readable and explicit but notebook-local.

Ref: #531

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions