Skip to content

Add census-block-first calibration pipeline (from PR #516)#531

Merged
baogorek merged 5 commits intomainfrom
census-block-calibration
Feb 17, 2026
Merged

Add census-block-first calibration pipeline (from PR #516)#531
baogorek merged 5 commits intomainfrom
census-block-calibration

Conversation

@baogorek
Copy link
Collaborator

Summary

Ports the core census-block calibration architecture from Max Ghenis's PR #516 onto current main. Since main evolved significantly while #516 was in development (new SQL views, schema changes, constraint validation, source field), this PR re-implements the pipeline on top of the current codebase rather than rebasing 36 commits across 40+ conflicting files.

New: policyengine_us_data/calibration/ package

  • clone_and_assign.py — Population-weighted census block sampling. Assigns each CPS record N geographic clones via census block distribution.
  • unified_matrix_builder.py — Clone-by-clone simulation with COO caching and disk-based resume. Queries targets via target_overview view, applies hierarchical uprating (CPI/population + ACA PTC state multipliers), builds sparse calibration matrix.
  • unified_calibration.py — L0 optimization pipeline with CLI. Includes seeded takeup re-randomization (block-level geographic variation in program participation).

Removed

  • sparse_matrix_builder.py, fit_calibration_weights.py, matrix_tracer.py — replaced by unified pipeline
  • 8 old tests dependent on SparseMatrixBuilder — replaced by 28 new tests in tests/test_calibration/

Key improvements over #516

  • Uses target_overview SQL view (no hardcoded variable lists or stratum_group_id)
  • Preserves hierarchical uprating with ACA PTC state multipliers
  • Count target detection via endswith("_count") pattern
  • Seeded takeup re-randomization for geographic variation in program participation
  • Clean COO cache resume for interrupted runs

Follow-up PRs

  • Source imputation with state predictor
  • PUF imputation reordering
  • Category-specific takeup re-randomization (EITC, WIC)

Test plan

  • All 28 new tests pass (pytest policyengine_us_data/tests/test_calibration/)
  • All 60 remaining old tests pass (pytest policyengine_us_data/tests/test_local_area_calibration/)
  • Import verification: from policyengine_us_data.calibration import unified_calibration
  • Formatting: black . -l 79 --check

🤖 Generated with Claude Code

baogorek and others added 2 commits February 12, 2026 18:11
Port core census-block architecture from Max Ghenis's PR #516 onto
current main. New calibration/ package with:
- clone_and_assign: population-weighted census block sampling
- unified_matrix_builder: clone-by-clone simulation with COO caching,
  target_overview-based querying, hierarchical uprating
- unified_calibration: L0 optimization pipeline with takeup
  re-randomization

Remove old SparseMatrixBuilder/MatrixTracer/fit_calibration_weights and
their associated tests. Simplify conftest.py for remaining tests.

Co-Authored-By: Max Ghenis <max@policyengine.org>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Max Ghenis <max@policyengine.org>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite local_area_calibration_setup.ipynb for clone-based pipeline
- Update calibration_matrix.ipynb for UnifiedMatrixBuilder API
- Update hierarchical_uprating.ipynb
- Fix Jupyter import error in unified_calibration.py (OutStream.reconfigure)
- Add drop_target_groups utility to calibration_utils.py
- Update changelog

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@PavelMakarchuk PavelMakarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Census-Block-First Calibration Pipeline

Thanks for the substantial work porting the census-block architecture from PR #516 onto current main. The 3-stage pipeline (clone_and_assignunified_matrix_builderunified_calibration) is well-architected, and CI is fully green. Below is a thorough review organized by severity.


CRITICAL — Must Fix Before Merge

1. modal_app/remote_calibration_runner.py references deleted file

remote_calibration_runner.py:57-58 still hardcodes the path to the now-deleted fit_calibration_weights.py:

script_path = (
    "policyengine_us_data/datasets/cps/"
    "local_area_calibration/fit_calibration_weights.py"
)

Any Modal-based calibration run after this PR merges will fail with FileNotFoundError. This needs to be updated to invoke the new pipeline, e.g.:

script_path = "policyengine_us_data/calibration/unified_calibration.py"

The CLI argument interface also changed (--dataset-path--dataset, new args --n-clones, --preset, etc.), so the subprocess.run call at lines 60-74 needs corresponding updates.


SIGNIFICANT — Strongly Recommend Fixing

2. SQL injection risk in _query_targets (unified_matrix_builder.py:159-172)

Target filter values are interpolated directly into SQL via f-strings:

ph = ",".join(f"'{dv}'" for dv in dvs)
or_conditions.append(f"tv.domain_variable IN ({ph})")

While this is an internal tool (not web-facing), values containing single quotes will cause query errors or unintended behavior. Recommend using SQLAlchemy parameterized queries or at minimum escaping the values.

3. _coo_parts stale state bug (unified_matrix_builder.py:869)

When cache_dir=None, COO data accumulates in self._coo_parts via a lazy hasattr check:

if not hasattr(self, "_coo_parts"):
    self._coo_parts = ([], [], [])
self._coo_parts[0].append(cr)

If build_matrix fails midway (e.g., simulation error on clone 5 of 10), a subsequent call will find stale partial data and append to it, producing a corrupted matrix. Fix: initialize self._coo_parts = ([], [], []) at the top of build_matrix (around line 642), not lazily inside the clone loop.

4. No integration test for build_matrix

The core method that assembles the sparse calibration matrix — the centerpiece of this PR — has zero test coverage. test_unified_matrix_builder.py only tests _query_targets, uprating, and hierarchical uprating. Even a minimal test with a small mock simulation (checking matrix shape, sparsity, and that column ordering matches clone_idx * n_records + record_idx) would significantly increase confidence. The old pipeline had extensive matrix-level tests (test_column_indexing.py, test_matrix_national_variation.py, etc.) that are now deleted without equivalent replacements.

5. Hardcoded voluntary_filing rate (unified_calibration.py:168)

if rate_key == "voluntary_filing":
    rate_or_dict = 0.05

All other takeup rates load from YAML parameter files in policyengine_us_data/parameters/take_up/. This single hardcoded value breaks the pattern. Should be moved to a voluntary_filing.yaml file for consistency, discoverability, and maintainability.


MINOR — Non-Blocking

6. builtins.print monkey-patching (unified_calibration.py:329-352)

fit_l0_weights temporarily replaces builtins.print to add stdout flushing. The finally block restores it correctly, but this is fragile — any concurrent code calling print during L0 fitting gets the patched version. Consider functools.partial(print, flush=True) passed as a callback, or simply setting PYTHONUNBUFFERED=1.

7. Private function imported cross-module (unified_matrix_builder.py:28)

from ...calibration_utils import (
    get_calculated_variables,
    apply_op,
    _get_geo_level,  # underscore = private by convention
)

_get_geo_level is now a cross-module dependency. Consider renaming to get_geo_level to signal it's part of the public API.

8. drop_target_groups added to calibration_utils.py without tests

New ~60-line utility function with non-trivial logic (label matching, geo-level filtering, sparse matrix row deletion). Used in calibration_matrix.ipynb but has no test coverage.

9. sys.stderr.reconfigure / sys.stdout.reconfigure (unified_calibration.py:33-38)

The try/except AttributeError handles Jupyter's OutStream (which lacks reconfigure), but calling reconfigure on non-TTY streams at module import time is unusual. Consider moving this into main() where it's actually needed, so importing the module for library use doesn't trigger side effects.


Architecture — Positive Observations

  • Clean pipeline separation: clone_and_assign → unified_matrix_builder → unified_calibration. Each module has a clear single responsibility.
  • COO caching with disk-based resume is a strong improvement for long-running calibration jobs that may be interrupted.
  • Seeded RNG for geographic assignment and takeup re-randomization ensures full reproducibility.
  • target_overview SQL view integration is cleaner than the old hardcoded variable lists and stratum_group_id lookups.
  • Tests are well-structured: mock CSV data, in-memory SQLite, no dependency on real data files.
  • Notebooks correctly updated — no stale references to deleted modules in either .ipynb file.
  • Changelog entry is comprehensive and well-formatted.

Summary

Severity Issue Location
CRITICAL modal_app/remote_calibration_runner.py references deleted fit_calibration_weights.py — will break Modal runs remote_calibration_runner.py:57
Significant SQL injection via f-string interpolation unified_matrix_builder.py:159-172
Significant _coo_parts stale state on build failure/re-call unified_matrix_builder.py:869
Significant No test for core build_matrix method test_unified_matrix_builder.py
Significant Hardcoded voluntary_filing rate breaks parameter pattern unified_calibration.py:168
Minor builtins.print monkey-patching unified_calibration.py:329
Minor Private _get_geo_level imported cross-module unified_matrix_builder.py:28
Minor drop_target_groups added without tests calibration_utils.py
Minor Module-level reconfigure side effect unified_calibration.py:33-38

Verdict: Request changes — the modal_app breakage is a production risk that must be fixed. The _coo_parts state bug and lack of build_matrix tests are also strongly recommended before merge.

- Fix modal_app/remote_calibration_runner.py to use unified_calibration.py
- Fix _coo_parts stale state bug (initialize at top of build_matrix)
- Add build_matrix geographic masking integration test (national/state/CD)
- Add drop_target_groups unit tests
- Add voluntary_filing.yaml, remove hardcoded 0.05 rate
- Rename _get_geo_level to get_geo_level (cross-module public API)
- Move sys.stderr.reconfigure from module level to main()
- Add OUTPUT_PATH/LOG_PATH markers for Modal runner parsing
- Update notebook example household to record_idx=8629 (high SNAP)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@baogorek
Copy link
Collaborator Author

baogorek commented Feb 13, 2026

@PavelMakarchuk thanks for the (quite helpful) review. Please see the points especially that I did not address, and see if you're okay with them.

Point 1 - Fixed
Two changes made:
  1. unified_calibration.py: Added OUTPUT_PATH: and LOG_PATH: print markers so the Modal runner can parse the output paths (matching the old contract)
  2. remote_calibration_runner.py: Updated script path and changed --dataset-path → --dataset

Point 2 - Didn't fix
So I tried out the fix and it led to more lines of code that were just ugly as sin. There's virtually zero risk here. If someone injects a drop table, they can just download the database again from Hugging Face. Are you okay if I skip this?

Point 3 - Fixed
Net change: +1 line at the top of build_matrix, -2 lines removing the hasattr guard. Clean fix.

Point 4 - Implemented an integration test but not a true mock
  1. Matrix shape — covered by TestMatrixShape
  2. Sparsity — covered
  3. Column ordering matches clone_idx * n_records + record_idx — implicitly verified.
  4. Equivalent to deleted tests — test_column_indexing, test_geo_masking, test_cross_state, test_same_state are all spiritually replaced by the
  state/district masking tests.

   Pavel suggested a "small mock simulation" that's self-contained and fast. Ours requires the real dataset. That's a
  tradeoff: stronger assertions but can't run without data.

  Pavel, it's going to be a lift to get a true mock, although I agree we should have one, so requesting to push that out.

Point 5 - Fixed
Created the YAML policyengine_us_data/parameters/take_up/voluntary_filing.yaml

Point 6 - Didn't fix
We would need to fix the L0 package to get the streaming to work, from my conversation with Claude. Skipping for now.

Point 7 - fixed

Point 8 - Fixed.
Test added

Point 9 - Fixed
reconfigure moved to main

Copy link
Collaborator

@PavelMakarchuk PavelMakarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review After Fix Commit (689949f)

Thanks for the thorough fix commit. Most of my original concerns are well-addressed. Here's the updated assessment.


Original Issues — Status

# Issue Status
1 CRITICAL: modal_app/remote_calibration_runner.py references deleted file Fixed — now points to unified_calibration.py, CLI args updated (--dataset-path--dataset), OUTPUT_PATH:/LOG_PATH: markers added for stdout parsing
2 SQL injection via f-string in _query_targets Not addressed (see below)
3 _coo_parts stale state on build failure Fixed — initialized at top of build_matrix (line 645)
4 No integration test for build_matrix Fixedtest_build_matrix_masking.py added with thorough geographic masking tests (national/state/CD visibility, 7 test cases)
5 Hardcoded voluntary_filing rate Fixedvoluntary_filing.yaml added, hardcoded 0.05 removed
6 builtins.print monkey-patching Not addressed (minor, acceptable)
7 _get_geo_level private function imported cross-module Partially fixed — renamed to get_geo_level in calibration_utils.py and unified_matrix_builder.py, but not in calibration_matrix.ipynb (see new issue below)
8 drop_target_groups untested Fixedtest_drop_target_groups.py added with 7 test cases including edge cases
9 Module-level sys.stderr.reconfigure side effect Fixed — moved into main()

NEW ISSUE — calibration_matrix.ipynb still uses _get_geo_level

The function was renamed from _get_geo_level to get_geo_level in calibration_utils.py, and unified_matrix_builder.py correctly imports get_geo_level. However, docs/calibration_matrix.ipynb still references the old name in 3 places:

  • Cell 2 (import): _get_geo_level,
  • Cell 5: geo_levels = targets_df["geographic_id"].apply(_get_geo_level)
  • Cell 20: geo_levels = targets_df["geographic_id"].apply(_get_geo_level)

This will cause a NameError when anyone runs the notebook. CI doesn't catch this because notebooks aren't executed during tests.


Remaining Minor: SQL injection in _query_targets

Lines 159-163 of unified_matrix_builder.py still construct SQL via f-string:

ph = ",".join(f"'{dv}'" for dv in dvs)
or_conditions.append(f"tv.domain_variable IN ({ph})")

This is low-risk since it's internal-only and the inputs come from code (not user-facing), so I won't block on it. But it's worth a follow-up to use parameterized queries, especially if this code is ever reused.


New Code Quality Assessment

test_build_matrix_masking.py — Well-designed integration test:

  • Uses a specific high-SNAP household (RECORD_IDX=8629) for deterministic assertions
  • Tests national visibility (both clones contribute), state masking (cross-state zeroes), and district masking (same-state-other-CD zeroes)
  • Properly skips when calibration data isn't available or when both clones land in same state
  • Uses scope="module" fixture to avoid rebuilding the matrix per test

test_drop_target_groups.py — Good coverage:

  • Tests matching, non-matching, drop-all, column preservation, and case-insensitive matching
  • Uses create_target_groups to get realistic group IDs rather than hardcoding

voluntary_filing.yaml — Clean, consistent with other parameter files.

Modal runner update — Correctly updated CLI args, added OUTPUT_PATH/LOG_PATH markers. Note: LOG_PATH now points to unified_diagnostics.csv (a diagnostics summary) rather than the old training log CSV. This is a minor behavior change for Modal users who might expect the old format — worth documenting in the changelog or Modal README if anyone consumes this file programmatically.


Verdict

The notebook _get_geo_levelget_geo_level rename is a one-line-per-cell fix. Once that's done, this looks good to merge. Changing to approve once that's addressed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@baogorek
Copy link
Collaborator Author

Thanks for the thorough reviews, @PavelMakarchuk . I've pushed a fix (9bcdc0f) that renames _get_geo_level to get_geo_level in the 3 places in calibration_matrix.ipynb: the import in cell 2 and the two .apply() calls in cells 5 and 20.

Regarding the SQL injection point in _query_targets: since the inputs come from internal code rather than user-facing input, I'm leaving that as-is for now. Happy to revisit in a follow-up if it becomes a concern.

@baogorek baogorek merged commit dc57eb7 into main Feb 17, 2026
7 checks passed
@baogorek baogorek deleted the census-block-calibration branch February 17, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments