Skip to content

Load prediction inflated ~1.5x due to fill_load_from_power double-counting on sparse data #3545

@mgazza

Description

@mgazza

Summary

When entity history data is recorded at intervals greater than 1 minute (e.g. 5-minute intervals), fill_load_from_power in fetch.py double-counts energy, inflating load predictions by approximately 1.3-1.6x actual consumption.

This results in:

  • predbat.load_energy predicting significantly more than actual consumption
  • Inflated savings_yesterday_predbat values
  • Cumulative savings_total_predbat growing too fast

Root Cause

Data Sparsity

When history data is recorded at 5-minute intervals (rather than HA's sub-second state change resolution), clean_incrementing_reverse produces a dict with entries at only ~9% of minute positions. The remaining ~91% of minutes return 0 from dict.get(minute, 0).

Double-Counting Mechanism

Phase 1 of fill_load_from_power (fetch.py:246-305):

  1. Iterates minute-by-minute looking for "zero periods" (consecutive minutes with value 0)
  2. The gap minutes between real data points appear as zero-value sequences
  3. Any gap ≥ gap_size (default 5 minutes) is detected as a "zero period"
  4. Phase 1 integrates power data for these gaps and ADDS it to the cumulative values (line 302)
  5. It then bumps up ALL more-recent minutes (lines 304-305), inflating the cumulative curve

Phase 2 (fetch.py:307-357):

  1. Divides data into 30-minute windows
  2. Reads cumulative values at start/end: load_total = load_at_start - load_at_end
  3. load_at_start is already inflated by Phase 1
  4. Scales power data to match the inflated total → energy counted twice

Why Standard HA Installs Are Unaffected

HA records state changes at sub-second resolution, producing dense minute-level data. fill_load_from_power finds no significant "zero periods" to misdetect, so Phase 2 works correctly.

Who Is Affected

Any deployment where entity history is stored at intervals greater than ~5 minutes (e.g. external databases with downsampled data, custom history backends).

Evidence

Example user on March 10, 2026:

  • consumption_today: 111 data points over 1200 minutes (9.25% coverage)
  • consumption_power: 193 data points over 1200 minutes
  • Yesterday's actual consumption: 19.7 kWh
  • PredBat predicted load: 51.6 kWh / 48h = 25.8 kWh/day (1.31x actual)
  • load_energy_actual tracking at 12.9 kWh at 19:50 (extrapolated ~15.6 kWh/day)
  • Energy balance verified: GivEnergy API consumption field correctly excludes battery charging

Suggested Fix

After clean_incrementing_reverse produces sparse cumulative data, linearly interpolate between known data points to fill every minute index before passing to fill_load_from_power. This prevents Phase 1 from misdetecting inter-sample gaps as zero periods.

Alternatively, fill_load_from_power Phase 1 could be made aware of data sparsity by only treating a period as "zero" if there are actual data points with value 0, rather than relying on dict.get(minute, 0) defaults.

Affected Files

  • apps/predbat/fetch.pyfill_load_from_power(), minute_data_load()
  • apps/predbat/utils.pyminute_data(), clean_incrementing_reverse()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions