Skip to content

Conversation

@malteschaaf
Copy link
Contributor

@malteschaaf malteschaaf commented Jan 14, 2026

This PR introduces two new configuration options to the ResamplerConfig class: closed and label. These options provide greater flexibility in defining the behavior of the resampling process. Additionally, tests have been added to ensure the correctness of these new features.

Changes:

  1. Add closed option to ResamplerConfig

    • Introduced the closed parameter with options "right" (default) and "left".
    • Updated the resampling logic to respect the closed configuration:
      • "right": Includes samples at the end of the resampling window, excludes those at the start.
      • "left": Includes samples at the start of the resampling window, excludes those at the end.
    • Adjusted documentation to explain the behavior of the closed parameter.
  2. Add label option to ResamplerConfig

    • Introduced the label parameter with options "end" (default) and "start".
    • Updated the resampling logic to respect the label configuration:
      • "end": The timestamp of the resampled data corresponds to the end of the resampling window.
      • "start": The timestamp of the resampled data corresponds to the start of the resampling window.
    • Adjusted the logic for setting sample_time to use the label configuration.
    • Updated documentation to explain the behavior of the label parameter.
  3. Add tests for closed and label options

    • Added unit tests to verify the behavior of the closed parameter for both "right" and "left" configurations.
    • Added unit tests to verify the behavior of the label parameter for both "start" and "end" configurations.

It must be a positive time span.
"""

max_data_age_in_periods: float = 3.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this value is leading to unexpected behavior, but this would be a very intrusive change which would result in many more NaN values in deployments. If we want to change it we could make it a required parameter for migration purpose, and later introduce the new default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even first run some experiments before changing the API here. We can just change the value in the edge app for now, or even make it configurable via env vars (like the resampling period) to allow for easy, per-site, experimentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as Christoph pointed out, this will lead to more NaNs. A middle approach, that will avoid the NaNs and degrade the quality only when data is missing is to change the resampling function.

The resampler was designed to be very flexible, so we can keep the 3 windows of data, and check inside the resampling function, if there is data for the last window, if there is, we can limit the function to use only data from the last window, and if there isn't, we can go back in time to find some data.

This can even be a composable function, like use_latest_window_only(average), so we can even exchange the underlying resampling function, basically separating buffer/history management from resampling.

Not sure if this is still useful from the DS point of view, or if it is best to just see a NaN if data is missing. From the control PoV, I think it makes sense to work on slightly old data rather than halting (or entering in emergency mode immediately). But even this is something we might want to discuss, as it wasn't reviewed since we first coded the resampler, and I think now we have a much better understanding of the use cases and requirements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me! I removed this change from this PR and we can discuss it further before any changes are made.

)
minimum_relevant_timestamp = timestamp - period * conf.max_data_age_in_periods

min_index = bisect(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behavior how to resample, i.e. left or right open and the labeling should be config parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make left or right opened configurable with the corresponding label, such as:
right open: [t,t+1) -> labeled as t
left open: (t-1,t] -> labeled as t (the current behavior)
Or do you want to allow the user to additionally do something like:
right open, label in the end: [t,t+1) -> labeled as t+1
left open, label in the beginning:(t-1,t] -> labeled as t-1
Because I think the later could lead to a lot of confusion (not sure if its ever needed)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latter is also reasonable options (see e.g. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html), but don't see a strong reason to implement this now if not needed. If it's well-documented, the users can also adjust the timestamps trivially. So your proposal sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, whatever we do, we should probably be much more explicit of how output samples are calculated and structured.

Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM, I think this would need quite some testing, and we probably want to introduce this using a config option to change the behavior, specially if configurability is something we want to allow in the future (if we don't see having this level of configurability useful, we can hide it in a feature flag using a env var like we did with the new wall clock timer). As we experimented recently with the changes in the component graph, subtle changes in behavior can go easily unnoticed and cause issues.

It must be a positive time span.
"""

max_data_age_in_periods: float = 3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even first run some experiments before changing the API here. We can just change the value in the edge app for now, or even make it configurable via env vars (like the resampling period) to allow for easy, per-site, experimentation.

)
minimum_relevant_timestamp = timestamp - period * conf.max_data_age_in_periods

min_index = bisect(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, whatever we do, we should probably be much more explicit of how output samples are calculated and structured.

It must be a positive time span.
"""

max_data_age_in_periods: float = 3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as Christoph pointed out, this will lead to more NaNs. A middle approach, that will avoid the NaNs and degrade the quality only when data is missing is to change the resampling function.

The resampler was designed to be very flexible, so we can keep the 3 windows of data, and check inside the resampling function, if there is data for the last window, if there is, we can limit the function to use only data from the last window, and if there isn't, we can go back in time to find some data.

This can even be a composable function, like use_latest_window_only(average), so we can even exchange the underlying resampling function, basically separating buffer/history management from resampling.

Not sure if this is still useful from the DS point of view, or if it is best to just see a NaN if data is missing. From the control PoV, I think it makes sense to work on slightly old data rather than halting (or entering in emergency mode immediately). But even this is something we might want to discuss, as it wasn't reviewed since we first coded the resampler, and I think now we have a much better understanding of the use cases and requirements.

…havior

- Introduced the `closed` parameter in ResamplerConfig with options `right` (default) and `left`.
- Updated the resampling logic to respect the `closed` configuration:
  - `right`: Includes samples at the end of the window, excludes those at the start.
  - `left`: Includes samples at the start of the window, excludes those at the end.
- Adjusted documentation to reflect the new `closed` parameter and its behavior.

Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
… resampled data

- Introduced the `label` parameter in ResamplerConfig with options `end` (default) and `start`.
- Updated the resampling logic to respect the `label` configuration:
  - `end`: The timestamp of the resampled data corresponds to the end of the resampling window.
  - `start`: The timestamp of the resampled data corresponds to the start of the resampling window.
- Adjusted the logic for setting `sample_time` to use the `label` configuration.
- Updated documentation to reflect the new `label` parameter and its behavior.

Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
@github-actions github-actions bot added the part:tests Affects the unit, integration and performance (benchmarks) tests label Feb 3, 2026
@malteschaaf malteschaaf changed the title WIP: Adjust Resampling Logic and Defaults Enhance ResamplerConfig with closed and label options and add comprehensive tests Feb 3, 2026
@malteschaaf malteschaaf changed the title Enhance ResamplerConfig with closed and label options and add comprehensive tests Enhance ResamplerConfig with closed and label options and add comprehensive tests Feb 3, 2026
@malteschaaf
Copy link
Contributor Author

Ready for Review 🧐

Changes implemented:

  • Removed modifications to the max_data_age_in_periods default value
  • Made the openness of the resampling window configurable via the closed parameter:
    • "right": Includes the end of the window, excludes the start. (Default: same behavior as before)
    • "left": Includes the start of the window, excludes the end.
  • Added configurability for timestamp labeling with the label parameter:
    • "start": Resampled timestamp represents the start of the resampling window.
    • "end": Resampled timestamp represents the end of the resampling window. (Default: same behavior as before)
  • Added comprehensive tests to verify the behavior of the closed and label parameters.

@github-actions github-actions bot added the part:docs Affects the documentation label Feb 3, 2026
@malteschaaf malteschaaf marked this pull request as ready for review February 3, 2026 16:14
@malteschaaf malteschaaf requested a review from a team as a code owner February 3, 2026 16:14
- Enhanced the `test_resampler_closed_option` to include additional samples at 2.5, 3, and 4 seconds.
- Verified the behavior of the `closed` option ("right" and "left") with the extended timeline.
- Added assertions to ensure correct resampling function calls and sink outputs for the new samples.
- Confirmed that source properties and buffer length are updated correctly after processing the additional samples.

Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
…havior

- Introduced `test_resampler_label_option` to validate the `label` configuration in ResamplerConfig.
- Tested both `start` and `end` options to ensure the resampled datas timestamp corresponds to the start or end of the resampling window, respectively.
- Verified sink outputs with the expected timestamp and resampled value.

Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:data-pipeline Affects the data pipeline part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests

Projects

Status: To do

Development

Successfully merging this pull request may close these issues.

3 participants