Add census-block-first calibration pipeline (from PR #516)#531
Add census-block-first calibration pipeline (from PR #516)#531
Conversation
Port core census-block architecture from Max Ghenis's PR #516 onto current main. New calibration/ package with: - clone_and_assign: population-weighted census block sampling - unified_matrix_builder: clone-by-clone simulation with COO caching, target_overview-based querying, hierarchical uprating - unified_calibration: L0 optimization pipeline with takeup re-randomization Remove old SparseMatrixBuilder/MatrixTracer/fit_calibration_weights and their associated tests. Simplify conftest.py for remaining tests. Co-Authored-By: Max Ghenis <max@policyengine.org> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Max Ghenis <max@policyengine.org> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite local_area_calibration_setup.ipynb for clone-based pipeline - Update calibration_matrix.ipynb for UnifiedMatrixBuilder API - Update hierarchical_uprating.ipynb - Fix Jupyter import error in unified_calibration.py (OutStream.reconfigure) - Add drop_target_groups utility to calibration_utils.py - Update changelog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PavelMakarchuk
left a comment
There was a problem hiding this comment.
PR Review: Census-Block-First Calibration Pipeline
Thanks for the substantial work porting the census-block architecture from PR #516 onto current main. The 3-stage pipeline (clone_and_assign → unified_matrix_builder → unified_calibration) is well-architected, and CI is fully green. Below is a thorough review organized by severity.
CRITICAL — Must Fix Before Merge
1. modal_app/remote_calibration_runner.py references deleted file
remote_calibration_runner.py:57-58 still hardcodes the path to the now-deleted fit_calibration_weights.py:
script_path = (
"policyengine_us_data/datasets/cps/"
"local_area_calibration/fit_calibration_weights.py"
)Any Modal-based calibration run after this PR merges will fail with FileNotFoundError. This needs to be updated to invoke the new pipeline, e.g.:
script_path = "policyengine_us_data/calibration/unified_calibration.py"The CLI argument interface also changed (--dataset-path → --dataset, new args --n-clones, --preset, etc.), so the subprocess.run call at lines 60-74 needs corresponding updates.
SIGNIFICANT — Strongly Recommend Fixing
2. SQL injection risk in _query_targets (unified_matrix_builder.py:159-172)
Target filter values are interpolated directly into SQL via f-strings:
ph = ",".join(f"'{dv}'" for dv in dvs)
or_conditions.append(f"tv.domain_variable IN ({ph})")While this is an internal tool (not web-facing), values containing single quotes will cause query errors or unintended behavior. Recommend using SQLAlchemy parameterized queries or at minimum escaping the values.
3. _coo_parts stale state bug (unified_matrix_builder.py:869)
When cache_dir=None, COO data accumulates in self._coo_parts via a lazy hasattr check:
if not hasattr(self, "_coo_parts"):
self._coo_parts = ([], [], [])
self._coo_parts[0].append(cr)If build_matrix fails midway (e.g., simulation error on clone 5 of 10), a subsequent call will find stale partial data and append to it, producing a corrupted matrix. Fix: initialize self._coo_parts = ([], [], []) at the top of build_matrix (around line 642), not lazily inside the clone loop.
4. No integration test for build_matrix
The core method that assembles the sparse calibration matrix — the centerpiece of this PR — has zero test coverage. test_unified_matrix_builder.py only tests _query_targets, uprating, and hierarchical uprating. Even a minimal test with a small mock simulation (checking matrix shape, sparsity, and that column ordering matches clone_idx * n_records + record_idx) would significantly increase confidence. The old pipeline had extensive matrix-level tests (test_column_indexing.py, test_matrix_national_variation.py, etc.) that are now deleted without equivalent replacements.
5. Hardcoded voluntary_filing rate (unified_calibration.py:168)
if rate_key == "voluntary_filing":
rate_or_dict = 0.05All other takeup rates load from YAML parameter files in policyengine_us_data/parameters/take_up/. This single hardcoded value breaks the pattern. Should be moved to a voluntary_filing.yaml file for consistency, discoverability, and maintainability.
MINOR — Non-Blocking
6. builtins.print monkey-patching (unified_calibration.py:329-352)
fit_l0_weights temporarily replaces builtins.print to add stdout flushing. The finally block restores it correctly, but this is fragile — any concurrent code calling print during L0 fitting gets the patched version. Consider functools.partial(print, flush=True) passed as a callback, or simply setting PYTHONUNBUFFERED=1.
7. Private function imported cross-module (unified_matrix_builder.py:28)
from ...calibration_utils import (
get_calculated_variables,
apply_op,
_get_geo_level, # underscore = private by convention
)_get_geo_level is now a cross-module dependency. Consider renaming to get_geo_level to signal it's part of the public API.
8. drop_target_groups added to calibration_utils.py without tests
New ~60-line utility function with non-trivial logic (label matching, geo-level filtering, sparse matrix row deletion). Used in calibration_matrix.ipynb but has no test coverage.
9. sys.stderr.reconfigure / sys.stdout.reconfigure (unified_calibration.py:33-38)
The try/except AttributeError handles Jupyter's OutStream (which lacks reconfigure), but calling reconfigure on non-TTY streams at module import time is unusual. Consider moving this into main() where it's actually needed, so importing the module for library use doesn't trigger side effects.
Architecture — Positive Observations
- Clean pipeline separation: clone_and_assign → unified_matrix_builder → unified_calibration. Each module has a clear single responsibility.
- COO caching with disk-based resume is a strong improvement for long-running calibration jobs that may be interrupted.
- Seeded RNG for geographic assignment and takeup re-randomization ensures full reproducibility.
target_overviewSQL view integration is cleaner than the old hardcoded variable lists and stratum_group_id lookups.- Tests are well-structured: mock CSV data, in-memory SQLite, no dependency on real data files.
- Notebooks correctly updated — no stale references to deleted modules in either
.ipynbfile. - Changelog entry is comprehensive and well-formatted.
Summary
| Severity | Issue | Location |
|---|---|---|
| CRITICAL | modal_app/remote_calibration_runner.py references deleted fit_calibration_weights.py — will break Modal runs |
remote_calibration_runner.py:57 |
| Significant | SQL injection via f-string interpolation | unified_matrix_builder.py:159-172 |
| Significant | _coo_parts stale state on build failure/re-call |
unified_matrix_builder.py:869 |
| Significant | No test for core build_matrix method |
test_unified_matrix_builder.py |
| Significant | Hardcoded voluntary_filing rate breaks parameter pattern |
unified_calibration.py:168 |
| Minor | builtins.print monkey-patching |
unified_calibration.py:329 |
| Minor | Private _get_geo_level imported cross-module |
unified_matrix_builder.py:28 |
| Minor | drop_target_groups added without tests |
calibration_utils.py |
| Minor | Module-level reconfigure side effect |
unified_calibration.py:33-38 |
Verdict: Request changes — the modal_app breakage is a production risk that must be fixed. The _coo_parts state bug and lack of build_matrix tests are also strongly recommended before merge.
- Fix modal_app/remote_calibration_runner.py to use unified_calibration.py - Fix _coo_parts stale state bug (initialize at top of build_matrix) - Add build_matrix geographic masking integration test (national/state/CD) - Add drop_target_groups unit tests - Add voluntary_filing.yaml, remove hardcoded 0.05 rate - Rename _get_geo_level to get_geo_level (cross-module public API) - Move sys.stderr.reconfigure from module level to main() - Add OUTPUT_PATH/LOG_PATH markers for Modal runner parsing - Update notebook example household to record_idx=8629 (high SNAP) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@PavelMakarchuk thanks for the (quite helpful) review. Please see the points especially that I did not address, and see if you're okay with them. |
PavelMakarchuk
left a comment
There was a problem hiding this comment.
Re-Review After Fix Commit (689949f)
Thanks for the thorough fix commit. Most of my original concerns are well-addressed. Here's the updated assessment.
Original Issues — Status
| # | Issue | Status |
|---|---|---|
| 1 | CRITICAL: modal_app/remote_calibration_runner.py references deleted file |
Fixed — now points to unified_calibration.py, CLI args updated (--dataset-path → --dataset), OUTPUT_PATH:/LOG_PATH: markers added for stdout parsing |
| 2 | SQL injection via f-string in _query_targets |
Not addressed (see below) |
| 3 | _coo_parts stale state on build failure |
Fixed — initialized at top of build_matrix (line 645) |
| 4 | No integration test for build_matrix |
Fixed — test_build_matrix_masking.py added with thorough geographic masking tests (national/state/CD visibility, 7 test cases) |
| 5 | Hardcoded voluntary_filing rate |
Fixed — voluntary_filing.yaml added, hardcoded 0.05 removed |
| 6 | builtins.print monkey-patching |
Not addressed (minor, acceptable) |
| 7 | _get_geo_level private function imported cross-module |
Partially fixed — renamed to get_geo_level in calibration_utils.py and unified_matrix_builder.py, but not in calibration_matrix.ipynb (see new issue below) |
| 8 | drop_target_groups untested |
Fixed — test_drop_target_groups.py added with 7 test cases including edge cases |
| 9 | Module-level sys.stderr.reconfigure side effect |
Fixed — moved into main() |
NEW ISSUE — calibration_matrix.ipynb still uses _get_geo_level
The function was renamed from _get_geo_level to get_geo_level in calibration_utils.py, and unified_matrix_builder.py correctly imports get_geo_level. However, docs/calibration_matrix.ipynb still references the old name in 3 places:
- Cell 2 (import):
_get_geo_level, - Cell 5:
geo_levels = targets_df["geographic_id"].apply(_get_geo_level) - Cell 20:
geo_levels = targets_df["geographic_id"].apply(_get_geo_level)
This will cause a NameError when anyone runs the notebook. CI doesn't catch this because notebooks aren't executed during tests.
Remaining Minor: SQL injection in _query_targets
Lines 159-163 of unified_matrix_builder.py still construct SQL via f-string:
ph = ",".join(f"'{dv}'" for dv in dvs)
or_conditions.append(f"tv.domain_variable IN ({ph})")This is low-risk since it's internal-only and the inputs come from code (not user-facing), so I won't block on it. But it's worth a follow-up to use parameterized queries, especially if this code is ever reused.
New Code Quality Assessment
test_build_matrix_masking.py — Well-designed integration test:
- Uses a specific high-SNAP household (
RECORD_IDX=8629) for deterministic assertions - Tests national visibility (both clones contribute), state masking (cross-state zeroes), and district masking (same-state-other-CD zeroes)
- Properly skips when calibration data isn't available or when both clones land in same state
- Uses
scope="module"fixture to avoid rebuilding the matrix per test
test_drop_target_groups.py — Good coverage:
- Tests matching, non-matching, drop-all, column preservation, and case-insensitive matching
- Uses
create_target_groupsto get realistic group IDs rather than hardcoding
voluntary_filing.yaml — Clean, consistent with other parameter files.
Modal runner update — Correctly updated CLI args, added OUTPUT_PATH/LOG_PATH markers. Note: LOG_PATH now points to unified_diagnostics.csv (a diagnostics summary) rather than the old training log CSV. This is a minor behavior change for Modal users who might expect the old format — worth documenting in the changelog or Modal README if anyone consumes this file programmatically.
Verdict
The notebook _get_geo_level → get_geo_level rename is a one-line-per-cell fix. Once that's done, this looks good to merge. Changing to approve once that's addressed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the thorough reviews, @PavelMakarchuk . I've pushed a fix (9bcdc0f) that renames _get_geo_level to get_geo_level in the 3 places in calibration_matrix.ipynb: the import in cell 2 and the two .apply() calls in cells 5 and 20. Regarding the SQL injection point in _query_targets: since the inputs come from internal code rather than user-facing input, I'm leaving that as-is for now. Happy to revisit in a follow-up if it becomes a concern. |
Summary
Ports the core census-block calibration architecture from Max Ghenis's PR #516 onto current main. Since main evolved significantly while #516 was in development (new SQL views, schema changes, constraint validation,
sourcefield), this PR re-implements the pipeline on top of the current codebase rather than rebasing 36 commits across 40+ conflicting files.New:
policyengine_us_data/calibration/packageclone_and_assign.py— Population-weighted census block sampling. Assigns each CPS record N geographic clones via census block distribution.unified_matrix_builder.py— Clone-by-clone simulation with COO caching and disk-based resume. Queries targets viatarget_overviewview, applies hierarchical uprating (CPI/population + ACA PTC state multipliers), builds sparse calibration matrix.unified_calibration.py— L0 optimization pipeline with CLI. Includes seeded takeup re-randomization (block-level geographic variation in program participation).Removed
sparse_matrix_builder.py,fit_calibration_weights.py,matrix_tracer.py— replaced by unified pipelinetests/test_calibration/Key improvements over #516
target_overviewSQL view (no hardcoded variable lists or stratum_group_id)endswith("_count")patternFollow-up PRs
Test plan
pytest policyengine_us_data/tests/test_calibration/)pytest policyengine_us_data/tests/test_local_area_calibration/)from policyengine_us_data.calibration import unified_calibrationblack . -l 79 --check🤖 Generated with Claude Code