Problem
test_microsim_runs[2025-hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5] fails with an entity-size mismatch:
ValueError: Input [0.000000e+00 9.374027e+04 0.000000e+00 ... 7.718095e-01 0.000000e+00
0.000000e+00] is not a valid value for the entity household (size = 41314 != 669 = count)
The failure is pre-existing on main — it reproduces by running the test against an unmodified checkout of main (no PR changes). Size 41314 looks like tax-unit-level array count; 669 is the subsampled-1000 household count in the test. Somewhere in the household_net_income dependency chain, a TaxUnit-shaped array is being handed to the household entity population without projection.
How it surfaced
Because PolicyEngine CI runs selective tests by changed-file area, this failure only triggers on PRs that touch the SPM / net-income chain. It was exposed on #8090 (which modified spm_unit_spm_expenses.py), but the root cause pre-dates that PR.
Repro
cd policyengine-us
git checkout main
./.venv/bin/python -m pytest "policyengine_us/tests/microsimulation/test_microsim.py::test_microsim_runs[2025-hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5]" -v
The other three parameter combos (2024-cps_2023, 2024-enhanced_cps_2024, 2025-cps_2023) all pass.
Possible causes
- A variable in the
household_net_income dependency chain is caching or broadcasting a TaxUnit-level array at the household entity when simulation year > dataset year in the enhanced-CPS data preparation.
- Could be in
policyengine-us-data (how enhanced_cps_2024.h5 is built) rather than the model.
Suggested next step
Trace the household_net_income dependency with the enhanced CPS dataset at period=2025 (e.g., set PYTHONBREAKPOINT before check_array_compatible_with_entity and inspect which variable's array has 41314 entries).
Problem
test_microsim_runs[2025-hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5]fails with an entity-size mismatch:The failure is pre-existing on
main— it reproduces by running the test against an unmodified checkout of main (no PR changes). Size 41314 looks like tax-unit-level array count; 669 is the subsampled-1000 household count in the test. Somewhere in thehousehold_net_incomedependency chain, a TaxUnit-shaped array is being handed to the household entity population without projection.How it surfaced
Because PolicyEngine CI runs selective tests by changed-file area, this failure only triggers on PRs that touch the SPM / net-income chain. It was exposed on #8090 (which modified
spm_unit_spm_expenses.py), but the root cause pre-dates that PR.Repro
The other three parameter combos (
2024-cps_2023,2024-enhanced_cps_2024,2025-cps_2023) all pass.Possible causes
household_net_incomedependency chain is caching or broadcasting a TaxUnit-level array at the household entity when simulation year > dataset year in the enhanced-CPS data preparation.policyengine-us-data(howenhanced_cps_2024.h5is built) rather than the model.Suggested next step
Trace the
household_net_incomedependency with the enhanced CPS dataset atperiod=2025(e.g., setPYTHONBREAKPOINTbeforecheck_array_compatible_with_entityand inspect which variable's array has 41314 entries).