fix: interpolate sparse data to prevent load inflation#3546
fix: interpolate sparse data to prevent load inflation#3546
Conversation
…#3545) SaaS instances record entity_history at 5-minute intervals, producing sparse cumulative dicts from clean_incrementing_reverse. When fill_load_from_power processes this sparse data, it treats the gaps between known data points as "zero periods" and fills them with power-integrated values, causing ~1.3-1.6x load energy inflation. Add interpolate_sparse_data() to linearly interpolate between known data points before fill_load_from_power runs, filling every minute index so no false gaps are detected. Midnight resets (>50% value drops) are handled by carrying forward instead of interpolating. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code reviewFound 1 issue:
batpred/apps/predbat/tests/test_fill_load_from_power.py Lines 307 to 379 in 7e8f631 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Test 7 previously had no assert statements and would always pass. Add assertion to verify sparse data produces measurable distortion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
springfall2008
left a comment
There was a problem hiding this comment.
Analysis: interpolate_sparse_data before fill_load_from_power
I investigated whether the interpolate_sparse_data(self.load_minutes) call (added before fill_load_from_power) actually changes anything compared to minute_data with smoothing=True.
Finding: it has no effect in practice.
minute_data (called with smoothing=True, clean_increment=True, backwards=True) always produces a fully dense dict — every key from 0 to days*24*60 is populated. This happens unconditionally at the end of the function via two passes:
- "Fill from last sample until now" — forward-fill from minute 0
- "Fill gaps in the middle" — carry-forward through every remaining gap
clean_incrementing_reverse()— iteratesrange(max(data)+1), writing every index
So by the time self.load_minutes reaches interpolate_sparse_data, there are no gaps to fill. The function returns unchanged data (0 minutes changed, confirmed empirically).
=== With smoothing=True (production code) ===
Missing minutes: []
Minutes changed by interpolate_sparse_data: 0
=== With smoothing=False ===
Missing minutes: []
Minutes changed by interpolate_sparse_data: 0
The original motivation for interpolate_sparse_data (preventing fill_load_from_power from doing wrong boundary lookups at period_end + 1) is real and correct, but the fix is already provided by minute_data's own gap-filling. The interpolate_sparse_data call is dead code in this path and can be removed without any behavioural change.
So the question is can we reproduce this issue? |
The gap detector in previous_days_modal_filter() checks if consecutive values are equal (data[m] == data[m+5]) to find missing data. After clean_incrementing_reverse(), zero-consumption overnight periods have equal consecutive values, triggering false gap detection and injecting phantom load (~6 kWh/night for a 24 kWh/day average). Track sensor data point provenance during minute_data() processing via a new data_point_minutes set parameter. In the gap detector, check whether the sensor was actively reporting during each gap period. If the sensor was online (≥1 data point/hour), skip filling. If offline, fill as before. Supersedes #3546 which attempted to fix the symptom via interpolation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Superseded by #3554 which fixes the root cause (false-positive gap detection) rather than the symptom (sparse data interpolation). The new approach tracks sensor data point provenance to distinguish 'sensor online, zero consumption' from 'sensor offline, no data'. |
Summary
fill_load_from_powerprocesses sparse entity history data (e.g. SaaS 5-minute intervals)interpolate_sparse_data()inutils.pyto linearly interpolate cumulative data between known points before gap-filling runsCloses #3545
Root Cause
SaaS instances record
entity_historyat 5-minute intervals, producing sparse cumulative dicts (only ~9% of minute indices populated).fill_load_from_powerPhase 1 usesdict.get(minute, 0)to check for gaps — sparse minutes return 0, get classified as "zero periods", and are filled with power-integrated data. Phase 2 then scales to match the now-inflated cumulative totals, double-counting energy.Changes
utils.py: Newinterpolate_sparse_data()function (50 lines)fetch.py: Call interpolation at bothge_cloud_databranches beforefill_load_from_powertest_interpolate_sparse_data.py: 12 new tests (edge cases, energy preservation, midnight resets, full-day simulation)test_fill_load_from_power.py: 4 new regression tests proving sparse data inflates and interpolated data doesn'tunit_test.py: Registered new tests in test registryTest plan
interpolate_sparse_dataunit tests passingfill_load_from_powerregression tests passingfill_load_from_powertests still pass (6/6)🤖 Generated with Claude Code