Adding an outlier check for dc capacity/power#223
Adding an outlier check for dc capacity/power#223qnguyen345 wants to merge 5 commits intopvlib:mainfrom
Conversation
pvanalytics/quality/outliers.py
Outdated
| return deviation > max_deviation * mad | ||
|
|
||
|
|
||
| def run_pvwatts_data_checks(power_series, nsrdb_weather_df): |
There was a problem hiding this comment.
Add underscore as this is a private method
pvanalytics/quality/outliers.py
Outdated
| azimuth : Float | ||
| Azimuth angle of site in degrees. | ||
| dc_capacity : Float | ||
| DC capacity of the site. |
pvanalytics/quality/outliers.py
Outdated
| return power_series | ||
|
|
||
|
|
||
| def run_pvwatts_model(tilt, azimuth, dc_capacity, dc_inverter_limit, |
pvanalytics/quality/outliers.py
Outdated
| Percent difference threshold for flagging data as anomalies. | ||
| Defaulted to 50. | ||
| dc_capacity : None or Float | ||
| DC capacity of the site. If the inverter dc capacity is not |
pvanalytics/quality/outliers.py
Outdated
|
|
||
| Returns | ||
| ------- | ||
| master_df : Pandas dataframe with datetime index |
There was a problem hiding this comment.
rename master_df as it's generic
There was a problem hiding this comment.
Return pandas series of percent difference, add new function to determine if anomalous where output is boolean with datetime index
|
My reaction is that
+1 to @kperrynrel's comments about the output of |
|
Hey @cwhanse, Quyen put this together on our end as this was a specific request from @williamhobbs. Southern wants to run an outlier check for "abnormal" daily behavior based on expected PVWatts output (they're using a lot of the PVAnalytics routines already). If you don't think it's a good fit, we could send him the code directly? Can you think of another open source repo where it may be more appropriate? |
|
Would the example be sufficient for @williamhobbs? The prepackaged PVWatts model could be a function in the example, although then it's not importable. For identifying the outliers from a percent absolute difference in daily values, only |
|
(I think this issue is almost 100% relevant to my comment below: #143.) Here's my summary of our in-person conversation, @cwhanse. Hopefully this captures everything (with new sketches!): We talked about a more general function/set of functions to flag deviations in a signal (like power or back of module temperature) from a reference, which could be from a physically adjacent piece of hardware (like inverter or Tbom sensor) or from a simulated signal, which I'm most interested in. It would be up to the user to provide the reference timeseries. Anomalies could be flagged if the deviation (absolute value?) exceeds some time-based threshold, e.g., off by 20% for 1 hr or 10% for one day. The threshold could be a curve based on a function with one or two parameters, or maybe a piece-wise function based on a table. See the sketch below. There could a be possible second support function that you feed historical "good" data to and it returns the threshold curve at some confidence interval (e.g., 95% or 99% of historical deviations where below this curve). I could see this being very useful, otherwise there could be a lot of trial and error for users. I imagine these concepts already exists somewhere. My quick web-searching turns up network traffic anomaly detection, but it seems to be based only on past trends, not on an independent reference "expectation". |
|
@williamhobbs @kperrynrel @qnguyen345 I propose we close this PR and replace with the following development goals:
|
|
@cwhanse - your proposal sounds good to me. Maybe the quantile regression in |
|
@cwhanse @williamhobbs also good with closing this and reopening another PR with the newly recommended logic. Thanks! |


- [ ] Closes #xxx- [ ] Clearly documented all new API functions with PEP257 and numpydoc compliant docstrings.- [ ] Added new API functions todocs/api.rst.in
docs/whatsnewfor all changes. Includes link to the GitHub Issue with
:issue:`num`or this Pull Request with
:pull:`num`. Includes contributor nameand/or GitHub username (link with
:ghuser:`user`).There can be days were the system is not producing the desired power output. We can measure the daily performance against a PVWatts model to determine those outlier days. We can model a system's expected dc capacity/ power output from PVWatts using the system metadata and nsrdb weather data. We can then compare the modeled daily time series to the real time series to get a percent difference. If the percent difference is over a certain threshold and is producing much less/more than is expected, we can flag that day as an anomaly.