Add census-block-first calibration pipeline (from PR #516) by baogorek · Pull Request #531 · PolicyEngine/policyengine-us-data

baogorek · 2026-02-12T23:12:04Z

Summary

Ports the core census-block calibration architecture from Max Ghenis's PR #516 onto current main. Since main evolved significantly while #516 was in development (new SQL views, schema changes, constraint validation, source field), this PR re-implements the pipeline on top of the current codebase rather than rebasing 36 commits across 40+ conflicting files.

New: `policyengine_us_data/calibration/` package

clone_and_assign.py — Population-weighted census block sampling. Assigns each CPS record N geographic clones via census block distribution.
unified_matrix_builder.py — Clone-by-clone simulation with COO caching and disk-based resume. Queries targets via target_overview view, applies hierarchical uprating (CPI/population + ACA PTC state multipliers), builds sparse calibration matrix.
unified_calibration.py — L0 optimization pipeline with CLI. Includes seeded takeup re-randomization (block-level geographic variation in program participation).

Removed

sparse_matrix_builder.py, fit_calibration_weights.py, matrix_tracer.py — replaced by unified pipeline
8 old tests dependent on SparseMatrixBuilder — replaced by 28 new tests in tests/test_calibration/

Key improvements over #516

Uses target_overview SQL view (no hardcoded variable lists or stratum_group_id)
Preserves hierarchical uprating with ACA PTC state multipliers
Count target detection via endswith("_count") pattern
Seeded takeup re-randomization for geographic variation in program participation
Clean COO cache resume for interrupted runs

Follow-up PRs

Source imputation with state predictor
PUF imputation reordering
Category-specific takeup re-randomization (EITC, WIC)

Test plan

All 28 new tests pass (pytest policyengine_us_data/tests/test_calibration/)
All 60 remaining old tests pass (pytest policyengine_us_data/tests/test_local_area_calibration/)
Import verification: from policyengine_us_data.calibration import unified_calibration
Formatting: black . -l 79 --check

🤖 Generated with Claude Code

Port core census-block architecture from Max Ghenis's PR #516 onto current main. New calibration/ package with: - clone_and_assign: population-weighted census block sampling - unified_matrix_builder: clone-by-clone simulation with COO caching, target_overview-based querying, hierarchical uprating - unified_calibration: L0 optimization pipeline with takeup re-randomization Remove old SparseMatrixBuilder/MatrixTracer/fit_calibration_weights and their associated tests. Simplify conftest.py for remaining tests. Co-Authored-By: Max Ghenis <max@policyengine.org> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Max Ghenis <max@policyengine.org> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Rewrite local_area_calibration_setup.ipynb for clone-based pipeline - Update calibration_matrix.ipynb for UnifiedMatrixBuilder API - Update hierarchical_uprating.ipynb - Fix Jupyter import error in unified_calibration.py (OutStream.reconfigure) - Add drop_target_groups utility to calibration_utils.py - Update changelog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PavelMakarchuk

PR Review: Census-Block-First Calibration Pipeline

Thanks for the substantial work porting the census-block architecture from PR #516 onto current main. The 3-stage pipeline (clone_and_assign → unified_matrix_builder → unified_calibration) is well-architected, and CI is fully green. Below is a thorough review organized by severity.

CRITICAL — Must Fix Before Merge

1. `modal_app/remote_calibration_runner.py` references deleted file

remote_calibration_runner.py:57-58 still hardcodes the path to the now-deleted fit_calibration_weights.py:

script_path = (
    "policyengine_us_data/datasets/cps/"
    "local_area_calibration/fit_calibration_weights.py"
)

Any Modal-based calibration run after this PR merges will fail with FileNotFoundError. This needs to be updated to invoke the new pipeline, e.g.:

script_path = "policyengine_us_data/calibration/unified_calibration.py"

The CLI argument interface also changed (--dataset-path → --dataset, new args --n-clones, --preset, etc.), so the subprocess.run call at lines 60-74 needs corresponding updates.

SIGNIFICANT — Strongly Recommend Fixing

2. SQL injection risk in `_query_targets` (`unified_matrix_builder.py:159-172`)

Target filter values are interpolated directly into SQL via f-strings:

ph = ",".join(f"'{dv}'" for dv in dvs)
or_conditions.append(f"tv.domain_variable IN ({ph})")

While this is an internal tool (not web-facing), values containing single quotes will cause query errors or unintended behavior. Recommend using SQLAlchemy parameterized queries or at minimum escaping the values.

3. `_coo_parts` stale state bug (`unified_matrix_builder.py:869`)

When cache_dir=None, COO data accumulates in self._coo_parts via a lazy hasattr check:

if not hasattr(self, "_coo_parts"):
    self._coo_parts = ([], [], [])
self._coo_parts[0].append(cr)

If build_matrix fails midway (e.g., simulation error on clone 5 of 10), a subsequent call will find stale partial data and append to it, producing a corrupted matrix. Fix: initialize self._coo_parts = ([], [], []) at the top of build_matrix (around line 642), not lazily inside the clone loop.

4. No integration test for `build_matrix`

The core method that assembles the sparse calibration matrix — the centerpiece of this PR — has zero test coverage. test_unified_matrix_builder.py only tests _query_targets, uprating, and hierarchical uprating. Even a minimal test with a small mock simulation (checking matrix shape, sparsity, and that column ordering matches clone_idx * n_records + record_idx) would significantly increase confidence. The old pipeline had extensive matrix-level tests (test_column_indexing.py, test_matrix_national_variation.py, etc.) that are now deleted without equivalent replacements.

5. Hardcoded `voluntary_filing` rate (`unified_calibration.py:168`)

if rate_key == "voluntary_filing":
    rate_or_dict = 0.05

All other takeup rates load from YAML parameter files in policyengine_us_data/parameters/take_up/. This single hardcoded value breaks the pattern. Should be moved to a voluntary_filing.yaml file for consistency, discoverability, and maintainability.

MINOR — Non-Blocking

6. `builtins.print` monkey-patching (`unified_calibration.py:329-352`)

fit_l0_weights temporarily replaces builtins.print to add stdout flushing. The finally block restores it correctly, but this is fragile — any concurrent code calling print during L0 fitting gets the patched version. Consider functools.partial(print, flush=True) passed as a callback, or simply setting PYTHONUNBUFFERED=1.

7. Private function imported cross-module (`unified_matrix_builder.py:28`)

from ...calibration_utils import (
    get_calculated_variables,
    apply_op,
    _get_geo_level,  # underscore = private by convention
)

_get_geo_level is now a cross-module dependency. Consider renaming to get_geo_level to signal it's part of the public API.

8. `drop_target_groups` added to `calibration_utils.py` without tests

New ~60-line utility function with non-trivial logic (label matching, geo-level filtering, sparse matrix row deletion). Used in calibration_matrix.ipynb but has no test coverage.

9. `sys.stderr.reconfigure` / `sys.stdout.reconfigure` (`unified_calibration.py:33-38`)

The try/except AttributeError handles Jupyter's OutStream (which lacks reconfigure), but calling reconfigure on non-TTY streams at module import time is unusual. Consider moving this into main() where it's actually needed, so importing the module for library use doesn't trigger side effects.

Architecture — Positive Observations

Clean pipeline separation: clone_and_assign → unified_matrix_builder → unified_calibration. Each module has a clear single responsibility.
COO caching with disk-based resume is a strong improvement for long-running calibration jobs that may be interrupted.
Seeded RNG for geographic assignment and takeup re-randomization ensures full reproducibility.
target_overview SQL view integration is cleaner than the old hardcoded variable lists and stratum_group_id lookups.
Tests are well-structured: mock CSV data, in-memory SQLite, no dependency on real data files.
Notebooks correctly updated — no stale references to deleted modules in either .ipynb file.
Changelog entry is comprehensive and well-formatted.

Summary

Severity	Issue	Location
CRITICAL	`modal_app/remote_calibration_runner.py` references deleted `fit_calibration_weights.py` — will break Modal runs	`remote_calibration_runner.py:57`
Significant	SQL injection via f-string interpolation	`unified_matrix_builder.py:159-172`
Significant	`_coo_parts` stale state on build failure/re-call	`unified_matrix_builder.py:869`
Significant	No test for core `build_matrix` method	`test_unified_matrix_builder.py`
Significant	Hardcoded `voluntary_filing` rate breaks parameter pattern	`unified_calibration.py:168`
Minor	`builtins.print` monkey-patching	`unified_calibration.py:329`
Minor	Private `_get_geo_level` imported cross-module	`unified_matrix_builder.py:28`
Minor	`drop_target_groups` added without tests	`calibration_utils.py`
Minor	Module-level `reconfigure` side effect	`unified_calibration.py:33-38`

Verdict: Request changes — the modal_app breakage is a production risk that must be fixed. The _coo_parts state bug and lack of build_matrix tests are also strongly recommended before merge.

- Fix modal_app/remote_calibration_runner.py to use unified_calibration.py - Fix _coo_parts stale state bug (initialize at top of build_matrix) - Add build_matrix geographic masking integration test (national/state/CD) - Add drop_target_groups unit tests - Add voluntary_filing.yaml, remove hardcoded 0.05 rate - Rename _get_geo_level to get_geo_level (cross-module public API) - Move sys.stderr.reconfigure from module level to main() - Add OUTPUT_PATH/LOG_PATH markers for Modal runner parsing - Update notebook example household to record_idx=8629 (high SNAP) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baogorek · 2026-02-13T22:39:12Z

@PavelMakarchuk thanks for the (quite helpful) review. Please see the points especially that I did not address, and see if you're okay with them.

Point 1 - Fixed
Two changes made:
  1. unified_calibration.py: Added OUTPUT_PATH: and LOG_PATH: print markers so the Modal runner can parse the output paths (matching the old contract)
  2. remote_calibration_runner.py: Updated script path and changed --dataset-path → --dataset

Point 2 - Didn't fix
So I tried out the fix and it led to more lines of code that were just ugly as sin. There's virtually zero risk here. If someone injects a drop table, they can just download the database again from Hugging Face. Are you okay if I skip this?

Point 3 - Fixed
Net change: +1 line at the top of build_matrix, -2 lines removing the hasattr guard. Clean fix.

Point 4 - Implemented an integration test but not a true mock
  1. Matrix shape — covered by TestMatrixShape
  2. Sparsity — covered
  3. Column ordering matches clone_idx * n_records + record_idx — implicitly verified.
  4. Equivalent to deleted tests — test_column_indexing, test_geo_masking, test_cross_state, test_same_state are all spiritually replaced by the
  state/district masking tests.

   Pavel suggested a "small mock simulation" that's self-contained and fast. Ours requires the real dataset. That's a
  tradeoff: stronger assertions but can't run without data.

  Pavel, it's going to be a lift to get a true mock, although I agree we should have one, so requesting to push that out.

Point 5 - Fixed
Created the YAML policyengine_us_data/parameters/take_up/voluntary_filing.yaml

Point 6 - Didn't fix
We would need to fix the L0 package to get the streaming to work, from my conversation with Claude. Skipping for now.

Point 7 - fixed

Point 8 - Fixed.
Test added

Point 9 - Fixed
reconfigure moved to main

PavelMakarchuk

Re-Review After Fix Commit (`689949f`)

Thanks for the thorough fix commit. Most of my original concerns are well-addressed. Here's the updated assessment.

Original Issues — Status

#	Issue	Status
1	CRITICAL: `modal_app/remote_calibration_runner.py` references deleted file	Fixed — now points to `unified_calibration.py`, CLI args updated (`--dataset-path` → `--dataset`), `OUTPUT_PATH:`/`LOG_PATH:` markers added for stdout parsing
2	SQL injection via f-string in `_query_targets`	Not addressed (see below)
3	`_coo_parts` stale state on build failure	Fixed — initialized at top of `build_matrix` (line 645)
4	No integration test for `build_matrix`	Fixed — `test_build_matrix_masking.py` added with thorough geographic masking tests (national/state/CD visibility, 7 test cases)
5	Hardcoded `voluntary_filing` rate	Fixed — `voluntary_filing.yaml` added, hardcoded `0.05` removed
6	`builtins.print` monkey-patching	Not addressed (minor, acceptable)
7	`_get_geo_level` private function imported cross-module	Partially fixed — renamed to `get_geo_level` in `calibration_utils.py` and `unified_matrix_builder.py`, but not in `calibration_matrix.ipynb` (see new issue below)
8	`drop_target_groups` untested	Fixed — `test_drop_target_groups.py` added with 7 test cases including edge cases
9	Module-level `sys.stderr.reconfigure` side effect	Fixed — moved into `main()`

NEW ISSUE — `calibration_matrix.ipynb` still uses `_get_geo_level`

The function was renamed from _get_geo_level to get_geo_level in calibration_utils.py, and unified_matrix_builder.py correctly imports get_geo_level. However, docs/calibration_matrix.ipynb still references the old name in 3 places:

Cell 2 (import): _get_geo_level,
Cell 5: geo_levels = targets_df["geographic_id"].apply(_get_geo_level)
Cell 20: geo_levels = targets_df["geographic_id"].apply(_get_geo_level)

This will cause a NameError when anyone runs the notebook. CI doesn't catch this because notebooks aren't executed during tests.

Remaining Minor: SQL injection in `_query_targets`

Lines 159-163 of unified_matrix_builder.py still construct SQL via f-string:

ph = ",".join(f"'{dv}'" for dv in dvs)
or_conditions.append(f"tv.domain_variable IN ({ph})")

This is low-risk since it's internal-only and the inputs come from code (not user-facing), so I won't block on it. But it's worth a follow-up to use parameterized queries, especially if this code is ever reused.

New Code Quality Assessment

test_build_matrix_masking.py — Well-designed integration test:

Uses a specific high-SNAP household (RECORD_IDX=8629) for deterministic assertions
Tests national visibility (both clones contribute), state masking (cross-state zeroes), and district masking (same-state-other-CD zeroes)
Properly skips when calibration data isn't available or when both clones land in same state
Uses scope="module" fixture to avoid rebuilding the matrix per test

test_drop_target_groups.py — Good coverage:

Tests matching, non-matching, drop-all, column preservation, and case-insensitive matching
Uses create_target_groups to get realistic group IDs rather than hardcoding

voluntary_filing.yaml — Clean, consistent with other parameter files.

Modal runner update — Correctly updated CLI args, added OUTPUT_PATH/LOG_PATH markers. Note: LOG_PATH now points to unified_diagnostics.csv (a diagnostics summary) rather than the old training log CSV. This is a minor behavior change for Modal users who might expect the old format — worth documenting in the changelog or Modal README if anyone consumes this file programmatically.

Verdict

The notebook _get_geo_level → get_geo_level rename is a one-line-per-cell fix. Once that's done, this looks good to merge. Changing to approve once that's addressed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baogorek · 2026-02-17T14:14:09Z

Thanks for the thorough reviews, @PavelMakarchuk . I've pushed a fix (9bcdc0f) that renames _get_geo_level to get_geo_level in the 3 places in calibration_matrix.ipynb: the import in cell 2 and the two .apply() calls in cells 5 and 20.

Regarding the SQL injection point in _query_targets: since the inputs come from internal code rather than user-facing input, I'm leaving that as-is for now. Happy to revisit in a follow-up if it becomes a concern.

baogorek and others added 2 commits February 12, 2026 18:11

Add changelog entry for census-block calibration pipeline

1f5966e

Co-Authored-By: Max Ghenis <max@policyengine.org> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced Feb 13, 2026

Add category-dependent takeup re-randomization (EITC, WIC) #532

Open

Target selection config for calibration optimizer input #533

Open

baogorek requested a review from PavelMakarchuk February 13, 2026 17:12

PavelMakarchuk requested changes Feb 13, 2026

View reviewed changes

baogorek mentioned this pull request Feb 13, 2026

Add calibration package checkpoint to unified_calibration pipeline #534

Open

baogorek requested a review from PavelMakarchuk February 13, 2026 22:39

baogorek mentioned this pull request Feb 16, 2026

Add PUF + source impute modules, fix AGI ceiling (issue #530) #537

Merged

3 tasks

baogorek requested a review from MaxGhenis February 17, 2026 03:46

PavelMakarchuk requested changes Feb 17, 2026

View reviewed changes

Rename _get_geo_level to get_geo_level in calibration_matrix notebook

9bcdc0f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baogorek requested a review from PavelMakarchuk February 17, 2026 14:13

PavelMakarchuk approved these changes Feb 17, 2026

View reviewed changes

baogorek merged commit dc57eb7 into main Feb 17, 2026
7 checks passed

baogorek deleted the census-block-calibration branch February 17, 2026 15:25

juaristi22 mentioned this pull request Feb 18, 2026

Category takeup rerandomization #540

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add census-block-first calibration pipeline (from PR #516)#531

Add census-block-first calibration pipeline (from PR #516)#531
baogorek merged 5 commits intomainfrom
census-block-calibration

baogorek commented Feb 12, 2026

Uh oh!

PavelMakarchuk left a comment

Uh oh!

baogorek commented Feb 13, 2026 •

edited

Loading

Uh oh!

PavelMakarchuk left a comment

Uh oh!

baogorek commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

baogorek commented Feb 12, 2026

Summary

New: policyengine_us_data/calibration/ package

Removed

Key improvements over #516

Follow-up PRs

Test plan

Uh oh!

PavelMakarchuk left a comment

Choose a reason for hiding this comment

PR Review: Census-Block-First Calibration Pipeline

CRITICAL — Must Fix Before Merge

1. modal_app/remote_calibration_runner.py references deleted file

SIGNIFICANT — Strongly Recommend Fixing

2. SQL injection risk in _query_targets (unified_matrix_builder.py:159-172)

3. _coo_parts stale state bug (unified_matrix_builder.py:869)

4. No integration test for build_matrix

5. Hardcoded voluntary_filing rate (unified_calibration.py:168)

MINOR — Non-Blocking

6. builtins.print monkey-patching (unified_calibration.py:329-352)

7. Private function imported cross-module (unified_matrix_builder.py:28)

8. drop_target_groups added to calibration_utils.py without tests

9. sys.stderr.reconfigure / sys.stdout.reconfigure (unified_calibration.py:33-38)

Architecture — Positive Observations

Summary

Uh oh!

baogorek commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PavelMakarchuk left a comment

Choose a reason for hiding this comment

Re-Review After Fix Commit (689949f)

Original Issues — Status

NEW ISSUE — calibration_matrix.ipynb still uses _get_geo_level

Remaining Minor: SQL injection in _query_targets

New Code Quality Assessment

Verdict

Uh oh!

baogorek commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

New: `policyengine_us_data/calibration/` package

1. `modal_app/remote_calibration_runner.py` references deleted file

2. SQL injection risk in `_query_targets` (`unified_matrix_builder.py:159-172`)

3. `_coo_parts` stale state bug (`unified_matrix_builder.py:869`)

4. No integration test for `build_matrix`

5. Hardcoded `voluntary_filing` rate (`unified_calibration.py:168`)

6. `builtins.print` monkey-patching (`unified_calibration.py:329-352`)

7. Private function imported cross-module (`unified_matrix_builder.py:28`)

8. `drop_target_groups` added to `calibration_utils.py` without tests

9. `sys.stderr.reconfigure` / `sys.stdout.reconfigure` (`unified_calibration.py:33-38`)

baogorek commented Feb 13, 2026 •

edited

Loading

Re-Review After Fix Commit (`689949f`)

NEW ISSUE — `calibration_matrix.ipynb` still uses `_get_geo_level`

Remaining Minor: SQL injection in `_query_targets`