Skip to content

fix: deduplicate time values in add_location instead of raising ValueError#1627

Open
beatfactor wants to merge 2 commits into
echostack-org:mainfrom
OceanStreamIO:oceanstream/fix-add-location-duplicate-times
Open

fix: deduplicate time values in add_location instead of raising ValueError#1627
beatfactor wants to merge 2 commits into
echostack-org:mainfrom
OceanStreamIO:oceanstream/fix-add-location-duplicate-times

Conversation

@beatfactor
Copy link
Copy Markdown
Contributor

Fixes #1478. EK80 data commonly has duplicate timestamps in Platform time dimensions (e.g., from multiple NMEA sentences — GGA, GLL, RMC — at the same timestamp). Previously add_location() raised a ValueError, requiring manual deduplication before use.

Now duplicate time values are automatically removed (keeping the first occurrence) with a UserWarning, and interpolation proceeds normally.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.59%. Comparing base (6cf6cee) to head (0d6554c).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
echopype/consolidate/api.py 80.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1627      +/-   ##
==========================================
+ Coverage   85.58%   85.59%   +0.01%     
==========================================
  Files          79       79              
  Lines        6998     7031      +33     
==========================================
+ Hits         5989     6018      +29     
- Misses       1009     1013       +4     
Flag Coverage Δ
integration 80.67% <90.00%> (+0.03%) ⬆️
unit 60.27% <15.00%> (-0.15%) ⬇️
unittests 85.49% <90.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@LOCEANlloydizard LOCEANlloydizard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @beatfactor, nice! thx for the PR! Just small thoughts:

  • Should we rename the function to something like check_and_drop_loc_time_dim_duplicates() to better reflect its new behavior?

  • For the test, should we also assert that latitude/longitude match the ping_time dimension (e.g. same length)? This would confirm the deduplication path still preserves the expected alignment (although I think it's already tested elsewhere? so not sure)

Cheers!

@beatfactor beatfactor force-pushed the oceanstream/fix-add-location-duplicate-times branch from 844e080 to 6ac03b6 Compare March 17, 2026 10:00
@beatfactor
Copy link
Copy Markdown
Contributor Author

@LOCEANlloydizard renamed and added the 2 new assertions. Thanks for reviewing!

Copy link
Copy Markdown
Collaborator

@LOCEANlloydizard LOCEANlloydizard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @beatfactor, thx for the changes!
It looks like the PR accidentally includes a local virtual environment (venv/). Could you remove it from the PR?
There are also pre-commit and RTD errors. I’m not sure whether the RTD one is related to the venv or not? lets see if it still raise after! Cheers

@beatfactor beatfactor force-pushed the oceanstream/fix-add-location-duplicate-times branch 2 times, most recently from f6371dc to 623ef15 Compare March 17, 2026 15:21
@beatfactor
Copy link
Copy Markdown
Contributor Author

Hi @beatfactor, thx for the changes! It looks like the PR accidentally includes a local virtual environment (venv/). Could you remove it from the PR? There are also pre-commit and RTD errors. I’m not sure whether the RTD one is related to the venv or not? lets see if it still raise after! Cheers

PR updated, sorry for the venv.

@LOCEANlloydizard
Copy link
Copy Markdown
Collaborator

Hi @beatfactor, we discussed this with @leewujung and thought it could be useful to make the warning more explicit when deduplicating timestamps, so it’s clear whether duplicate come from mixed NMEA sentence types or from a single one?

I tried to push this directly to the PR branch but don’t have permission, so sharing here!
So, prior to

    # Select NMEA subset (if applicable) and interpolate location variables and place
    # into `interp_ds`.
    for loc_name, interp_loc_name in [(lat_name, "latitude"), (lon_name, "longitude")]:

I added

# Build contextual warning message for duplicate timestamps.
# In the default NMEA case, multiple sentence types may be mixed,
# which can produce duplicate timestamps due to differing resolution.
extra_msg = ""

if nmea_sentence is None and datagram_type is None:
    sentence_types = np.unique(echodata["Platform"]["sentence_type"].values)
    sentence_types = [str(s) for s in sentence_types]

    if len(sentence_types) > 1:
        extra_msg = (
            f" Multiple NMEA sentence types detected ({', '.join(sentence_types)}), "
            "which may have different resolution and produce duplicate timestamps. "
            "Consider specifying `nmea_sentence` to select a single GPS message type."
        )
    elif len(sentence_types) == 1:
        extra_msg = (
            f" Duplicate timestamps found within NMEA sentence type {sentence_types[0]}."
        )

and that message is passed to the warning:

warnings.warn(
    f'Dropped {n_total - n_unique} duplicate value(s) in "{time_dim_name}".' + extra_msg,
    UserWarning,
    stacklevel=2,
)

That only requires to change the function signature

def check_and_drop_loc_time_dim_duplicates(
    da: xr.DataArray,
    time_dim_name: str,
    extra_msg: str = "",
) -> xr.DataArray:

and calling

loc_var = check_and_drop_loc_time_dim_duplicates(loc_var, time_dim_name, extra_msg)

That it, just suggestions, happy to hear your thoughts on this, or if you’d prefer another approach!
And sorry, I would have preferred to push these modifications directly.. Cheers!

@beatfactor
Copy link
Copy Markdown
Contributor Author

beatfactor commented Mar 20, 2026

@LOCEANlloydizard No problem, this make sense, I'll add them in soon.

Yeah, it's not possible to push directly to our fork, you'd need to send a PR there instead, and once I merge your change into our branch, it would reflect into this PR automatically.

Or you can suggest edits directly in the Changes view, as far as I am aware.

@beatfactor beatfactor force-pushed the oceanstream/fix-add-location-duplicate-times branch from 623ef15 to c0c4bd5 Compare March 22, 2026 15:21
@beatfactor
Copy link
Copy Markdown
Contributor Author

@LOCEANlloydizard have you had a chance to check the latest? I made the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add_location() fails if times aren't unique

3 participants