-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Summary
An audit of the Enhanced CPS state datasets reveals a hard top-coding ceiling at $6,263,051 AGI across all 51 state datasets. No state has a single observation above $10M, and only a handful have sparse records between $5M-$6.3M. This creates significant data quality issues for modeling policies that target high-income brackets (e.g., NY A05435 which raises rates on $5M-$25M and $25M+ brackets).
Impact
- Bills targeting income above $10M or $25M have zero observations to model against
- Revenue estimates for high-bracket policies are systematically understated
- Example: NY A05435 estimates $120M revenue from the $5M+ bracket change, but captures zero impact from the $25M+ bracket change — the real revenue would be substantially higher
Audit methodology
For each state, loaded the Enhanced CPS dataset via Microsimulation, calculated adjusted_gross_income at the tax unit level, and counted raw records and weighted totals above $1M, $5M, $10M, and $25M thresholds.
from huggingface_hub import hf_hub_download
from policyengine_us import Microsimulation
dataset_path = hf_hub_download(
repo_id="policyengine/policyengine-us-data",
filename=f"states/{state}.h5",
repo_type="model",
)
sim = Microsimulation(dataset=dataset_path)
agi = sim.calculate("adjusted_gross_income", 2026).values
weight = sim.calculate("tax_unit_weight", 2026).valuesFull results by state
| State | $1M+ raw | $1M+ wtd | avg wt | $5M+ raw | $5M+ wtd | avg wt | $10M+ raw | $10M+ wtd | $25M+ raw | $25M+ wtd | Max AGI |
|---|---|---|---|---|---|---|---|---|---|---|---|
| AL | 513 | 19,593 | 38 | 22 | 76 | 3 | 0 | 0 | 0 | 0 | $6,263,051 |
| AK | 67 | 3,263 | 49 | 3 | 136 | 46 | 0 | 0 | 0 | 0 | $5,575,013 |
| AZ | 864 | 40,390 | 47 | 31 | 758 | 24 | 0 | 0 | 0 | 0 | $6,263,051 |
| AR | 325 | 11,072 | 34 | 12 | 366 | 31 | 0 | 0 | 0 | 0 | $6,263,051 |
| CA | 10,353 | 344,096 | 33 | 304 | 5,552 | 18 | 0 | 0 | 0 | 0 | $6,263,051 |
| CO | 1,251 | 45,100 | 36 | 37 | 866 | 23 | 0 | 0 | 0 | 0 | $6,263,051 |
| CT | 839 | 42,203 | 50 | 26 | 463 | 18 | 0 | 0 | 0 | 0 | $6,263,051 |
| DE | 97 | 5,182 | 53 | 2 | 71 | 36 | 0 | 0 | 0 | 0 | $5,571,654 |
| DC | 311 | 9,054 | 29 | 9 | 240 | 27 | 0 | 0 | 0 | 0 | $5,793,470 |
| FL | 1,971 | 208,750 | 106 | 89 | 8,724 | 98 | 0 | 0 | 0 | 0 | $6,263,051 |
| GA | 1,620 | 58,722 | 36 | 42 | 611 | 15 | 0 | 0 | 0 | 0 | $6,263,051 |
| HI | 286 | 5,341 | 19 | 10 | 81 | 8 | 0 | 0 | 0 | 0 | $5,793,470 |
| ID | 178 | 9,561 | 54 | 5 | 112 | 22 | 0 | 0 | 0 | 0 | $5,607,726 |
| IL | 1,818 | 93,636 | 52 | 57 | 1,313 | 23 | 0 | 0 | 0 | 0 | $6,263,051 |
| IN | 781 | 31,170 | 40 | 24 | 181 | 8 | 0 | 0 | 0 | 0 | $6,263,051 |
| IA | 269 | 15,641 | 58 | 7 | 106 | 15 | 0 | 0 | 0 | 0 | $6,263,051 |
| KS | 427 | 14,788 | 35 | 11 | 142 | 13 | 0 | 0 | 0 | 0 | $6,263,051 |
| KY | 475 | 15,236 | 32 | 21 | 133 | 6 | 0 | 0 | 0 | 0 | $6,263,051 |
| LA | 444 | 19,520 | 44 | 9 | 97 | 11 | 0 | 0 | 0 | 0 | $6,263,051 |
| ME | 158 | 6,247 | 40 | 4 | 125 | 31 | 0 | 0 | 0 | 0 | $5,699,046 |
| MD | 1,528 | 36,862 | 24 | 48 | 358 | 7 | 0 | 0 | 0 | 0 | $6,263,051 |
| MA | 1,839 | 78,200 | 43 | 53 | 1,321 | 25 | 0 | 0 | 0 | 0 | $6,263,051 |
| MI | 944 | 52,221 | 55 | 28 | 287 | 10 | 0 | 0 | 0 | 0 | $6,263,051 |
| MN | 1,116 | 35,256 | 32 | 33 | 403 | 12 | 0 | 0 | 0 | 0 | $6,263,051 |
| MS | 263 | 8,166 | 31 | 7 | 13 | 2 | 0 | 0 | 0 | 0 | $5,793,470 |
| MO | 604 | 30,117 | 50 | 23 | 157 | 7 | 0 | 0 | 0 | 0 | $6,263,051 |
| MT | 224 | 6,538 | 29 | 3 | 172 | 57 | 0 | 0 | 0 | 0 | $5,571,654 |
| NE | 189 | 11,803 | 62 | 9 | 62 | 7 | 0 | 0 | 0 | 0 | $6,263,051 |
| NV | 274 | 23,761 | 87 | 10 | 577 | 58 | 0 | 0 | 0 | 0 | $5,699,046 |
| NH | 156 | 11,986 | 77 | 5 | 144 | 29 | 0 | 0 | 0 | 0 | $5,586,008 |
| NJ | 2,275 | 88,779 | 39 | 64 | 958 | 15 | 0 | 0 | 0 | 0 | $6,263,051 |
| NM | 196 | 7,310 | 37 | 5 | 20 | 4 | 0 | 0 | 0 | 0 | $6,263,051 |
| NY | 4,730 | 187,147 | 40 | 153 | 4,335 | 28 | 0 | 0 | 0 | 0 | $6,263,051 |
| NC | 1,498 | 55,637 | 37 | 46 | 385 | 8 | 0 | 0 | 0 | 0 | $6,263,051 |
| ND | 81 | 5,731 | 71 | 5 | 38 | 8 | 0 | 0 | 0 | 0 | $6,263,051 |
| OH | 1,088 | 59,179 | 54 | 30 | 285 | 10 | 0 | 0 | 0 | 0 | $6,263,051 |
| OK | 342 | 16,165 | 47 | 6 | 208 | 35 | 0 | 0 | 0 | 0 | $5,686,040 |
| OR | 942 | 20,904 | 22 | 32 | 398 | 12 | 0 | 0 | 0 | 0 | $6,263,051 |
| PA | 1,598 | 81,179 | 51 | 46 | 815 | 18 | 0 | 0 | 0 | 0 | $6,263,051 |
| RI | 175 | 6,691 | 38 | 3 | 58 | 19 | 0 | 0 | 0 | 0 | $5,675,536 |
| SC | 641 | 23,356 | 36 | 18 | 275 | 15 | 0 | 0 | 0 | 0 | $6,263,051 |
| SD | 48 | 6,585 | 137 | 3 | 116 | 39 | 0 | 0 | 0 | 0 | $5,686,040 |
| TN | 518 | 39,970 | 77 | 18 | 166 | 9 | 0 | 0 | 0 | 0 | $6,263,051 |
| TX | 2,242 | 199,803 | 89 | 98 | 4,544 | 46 | 0 | 0 | 0 | 0 | $6,263,051 |
| UT | 618 | 18,401 | 30 | 18 | 89 | 5 | 0 | 0 | 0 | 0 | $6,263,051 |
| VT | 86 | 3,199 | 37 | 2 | 24 | 12 | 0 | 0 | 0 | 0 | $5,586,008 |
| VA | 1,481 | 56,697 | 38 | 42 | 621 | 15 | 0 | 0 | 0 | 0 | $6,263,051 |
| WA | 631 | 72,514 | 115 | 26 | 2,636 | 101 | 0 | 0 | 0 | 0 | $6,263,051 |
| WV | 114 | 4,967 | 44 | 3 | 11 | 4 | 0 | 0 | 0 | 0 | $5,607,726 |
| WI | 619 | 32,345 | 52 | 20 | 302 | 15 | 0 | 0 | 0 | 0 | $6,263,051 |
| WY | 63 | 4,636 | 74 | 3 | 357 | 119 | 0 | 0 | 0 | 0 | $5,572,205 |
Key observations
- Hard ceiling at $6,263,051 — this is the CPS top-coding/swapping cap. Most states hit this exact number.
- Zero observations above $10M in every state — policies targeting $10M+ or $25M+ brackets cannot be modeled.
- Sparse $5M+ data — even large states have very few records (NY: 153 raw / 4,335 weighted; CA: 304 raw / 5,552 weighted). Many of these have tiny weights (1-5), meaning a single record can disproportionately drive results.
- Weight instability — average weights for $5M+ records vary wildly (FL: 98, WA: 101 vs MS: 2, KY: 6), suggesting calibration noise at the tail.
Practical consequence
For the state legislative tracker, any bill targeting income above $5M will have unreliable estimates, and bills targeting $10M+ or $25M+ will show zero impact from those brackets. This needs to be disclosed as a data limitation or addressed via tail imputation (e.g., Pareto extrapolation from IRS SOI data).