Skip to content

SoftSegmenter and LongYAAL#19

Closed
pe-trik wants to merge 6 commits intohlt-mt:mainfrom
pe-trik:longyaal
Closed

SoftSegmenter and LongYAAL#19
pe-trik wants to merge 6 commits intohlt-mt:mainfrom
pe-trik:longyaal

Conversation

@pe-trik
Copy link

@pe-trik pe-trik commented Feb 16, 2026

This pull request introduces a new latency metric implementation, LongYAAL.

New metric implementation:

  • Added the LongYAAL class to simulstream.metrics.scorers.latency.long_yaal.py, implementing the Long-form Yet Another Average Lagging metric.
  • Registered the new scorer under the name "long_yaal" for use in the evaluation framework.
  • Implemented SoftSegmenter that replaces mWERSegmenter in LongYAAL

Copy link
Contributor

@sarapapi sarapapi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moses dependency should be added in the eval requirements:

eval = [

@mgaido91
Copy link
Contributor

The CI has been fixed in #22 . Please pull from the main branch next time you push so that the CI gets fixed here as well, thanks.

@pe-trik pe-trik requested a review from mgaido91 February 26, 2026 16:24
Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another couple of questions:

  • regarding the character-level case, here we are just splitting every character on its own, while for the mwersegmenter a dedicated segmenter is used (see 93c51b4). I am not sure how the two things differ, but Is there a reason for using a different segmentation method?
  • how do you envision the quality scoring part when this latency scorer is used? Do we score quality metrics always with the mwersegmenter, while using this segmenter for the latency? Or shall we introduce this segmenter for the quality scoring as well?

PS Can we also add a simple UT with the word and the char case as done in 93c51b4?

Thanks!

Args:
args (argparse.Namespace): Parsed command-line arguments.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid unrelated changes

before computing latency, making it more robust for long-form speech translation evaluation.

The key difference from StreamLAAL is the use of SoftSegmenter's more sophisticated
alignment algorithm that handles long-form audio better. Additionally, LongYAAL is considers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alignment algorithm that handles long-form audio better. Additionally, LongYAAL is considers
alignment algorithm that handles long-form audio better. Additionally, LongYAAL considers

"""
...

def _split_delays_by_segmented_text(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be removed and inherited from the parent class

... # Compute a custom latency score
... return LatencyScores(...)
"""
def __init__(self, args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also can be removed and inherited from parent



@dataclass
class ResegmentedLatencyScoringSample:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have a segmenter_based_scorer file we can put this there

delay=word.delay,
seq_id=word.seq_id,
elapsed=word.elapsed,
main=main,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than main this would be the first?

Comment on lines +456 to +457
for i in range(len(sample.reference)):
new_segmentation[i] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is useless since we have the get at line 466

Comment on lines +481 to +482
ideal_delays = [w.delay - ref.start_time for w in segment_words]
ca_delays = [w.elapsed - ref.start_time for w in segment_words]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where for coherence with the mwersegmenter and other latency measues we should get rid of the - ref.start_time. We can do this in the YAAL code

return True

@abstractmethod
def _do_score(self, samples: List[ResegmentedLatencyScoringSample]) -> LatencyScores:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move also this to the parent class

index += segment_len
assert len(delays) == index, \
f"Index {index} should have reached end of delays ({len(delays)})"
return segmented_delays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return segmented_delays
return segmented_delays
def _resegment_samples(self, samples: List[LatencyScoringSample]) -> List[ResegmentedLatencyScoringSample]:
...
def score(self, samples: List[LatencyScoringSample]) -> LatencyScores:
resegmented_samples = self._resegment_samples(samples)
return self._do_score(resegmented_samples)

and we can add a comment to the main class that sublcasses should implement _resegment_samples,, like it is done for _do_score. In this way we can isolate in the subclasses the resegmantion part. Thanks.

simulstream.metrics.scorers.quality.mwersegmenter
simulstream.metrics.scorers.latency
simulstream.metrics.scorers.latency.mwersegmenter
simulstream.metrics.scorers.latency.softsegmenter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
simulstream.metrics.scorers.latency.softsegmenter
simulstream.metrics.scorers.latency.softsegmenter
simulstream.metrics.scorers.latency.segmenter_based_scorer

@pe-trik
Copy link
Author

pe-trik commented Mar 1, 2026

I am closing this PR. I have filed a new PR #24 that implements a unified quality and latency evaluation via OmniSTEval as proposed in https://arxiv.org/abs/2509.17349.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants