-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Summary
The tip income construction in sipp.py sums all columns matching *TXAMT*, which inadvertently includes both the actual tip dollar amounts (TJB*_TXAMT) and Census allocation flags (AJB*_TXAMT). The allocation flags are small integers (0, 1, 2) indicating whether Census imputed the value, not dollar amounts.
Current code
policyengine_us_data/datasets/sipp/sipp.py line ~69-72:
df["tip_income"] = (
df[df.columns[df.columns.str.contains("TXAMT")]].fillna(0).sum(axis=1)
* 12
)Fix
Filter to only the actual tip amount columns:
df["tip_income"] = (
df[df.columns[df.columns.str.match(r"TJB\d_TXAMT")]].fillna(0).sum(axis=1)
* 12
)Impact
Likely minor since allocation flags are small integers vs dollar amounts, but it's incorrect and should be fixed.
Context
This was identified while comparing PolicyEngine's tip income deduction revenue estimate ($4.7B) against JCT's score ($10.0B for FY2026). See related issues for other improvements to close this gap.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working