Skip to content

Bug: SIPP tip income imputation includes allocation flag columns #524

@MaxGhenis

Description

@MaxGhenis

Summary

The tip income construction in sipp.py sums all columns matching *TXAMT*, which inadvertently includes both the actual tip dollar amounts (TJB*_TXAMT) and Census allocation flags (AJB*_TXAMT). The allocation flags are small integers (0, 1, 2) indicating whether Census imputed the value, not dollar amounts.

Current code

policyengine_us_data/datasets/sipp/sipp.py line ~69-72:

df["tip_income"] = (
    df[df.columns[df.columns.str.contains("TXAMT")]].fillna(0).sum(axis=1)
    * 12
)

Fix

Filter to only the actual tip amount columns:

df["tip_income"] = (
    df[df.columns[df.columns.str.match(r"TJB\d_TXAMT")]].fillna(0).sum(axis=1)
    * 12
)

Impact

Likely minor since allocation flags are small integers vs dollar amounts, but it's incorrect and should be fixed.

Context

This was identified while comparing PolicyEngine's tip income deduction revenue estimate ($4.7B) against JCT's score ($10.0B for FY2026). See related issues for other improvements to close this gap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions