SoftSegmenter and LongYAAL by pe-trik · Pull Request #19 · hlt-mt/simulstream

pe-trik · 2026-02-16T18:46:59Z

This pull request introduces a new latency metric implementation, LongYAAL.

New metric implementation:

Added the LongYAAL class to simulstream.metrics.scorers.latency.long_yaal.py, implementing the Long-form Yet Another Average Lagging metric.
Registered the new scorer under the name "long_yaal" for use in the evaluation framework.
Implemented SoftSegmenter that replaces mWERSegmenter in LongYAAL

simulstream/metrics/scorers/latency/long_yaal.py

simulstream/metrics/scorers/latency/softsegmenter.py

sarapapi

Moses dependency should be added in the eval requirements:

simulstream/pyproject.toml

Line 62 in 82c75ee

eval = [

simulstream/metrics/scorers/latency/long_yaal.py

simulstream/metrics/scorers/latency/softsegmenter.py

simulstream/metrics/scorers/latency/long_yaal.py

simulstream/metrics/scorers/latency/softsegmenter.py

simulstream/metrics/scorers/latency/long_yaal.py

simulstream/metrics/scorers/latency/softsegmenter.py

mgaido91 · 2026-02-25T14:07:02Z

The CI has been fixed in #22 . Please pull from the main branch next time you push so that the CI gets fixed here as well, thanks.

mgaido91

I have another couple of questions:

regarding the character-level case, here we are just splitting every character on its own, while for the mwersegmenter a dedicated segmenter is used (see 93c51b4). I am not sure how the two things differ, but Is there a reason for using a different segmentation method?
how do you envision the quality scoring part when this latency scorer is used? Do we score quality metrics always with the mwersegmenter, while using this segmenter for the latency? Or shall we introduce this segmenter for the quality scoring as well?

PS Can we also add a simple UT with the word and the char case as done in 93c51b4?

Thanks!

mgaido91 · 2026-02-27T10:51:46Z

simulstream/metrics/scorers/latency/__init__.py

    Args:
        args (argparse.Namespace): Parsed command-line arguments.
    """
+


please avoid unrelated changes

mgaido91 · 2026-02-27T10:55:10Z

simulstream/metrics/scorers/latency/long_yaal.py

+    before computing latency, making it more robust for long-form speech translation evaluation.
+
+    The key difference from StreamLAAL is the use of SoftSegmenter's more sophisticated
+    alignment algorithm that handles long-form audio better. Additionally, LongYAAL is considers


Suggested change

alignment algorithm that handles long-form audio better. Additionally, LongYAAL is considers

alignment algorithm that handles long-form audio better. Additionally, LongYAAL considers

mgaido91 · 2026-02-27T11:02:25Z

simulstream/metrics/scorers/latency/mwersegmenter.py

        """
        ...

    def _split_delays_by_segmented_text(


this should be removed and inherited from the parent class

mgaido91 · 2026-02-27T11:03:09Z

simulstream/metrics/scorers/latency/mwersegmenter.py

        ...         # Compute a custom latency score
        ...         return LatencyScores(...)
    """
    def __init__(self, args):


this also can be removed and inherited from parent

mgaido91 · 2026-02-27T11:03:46Z

simulstream/metrics/scorers/latency/__init__.py



+@dataclass
+class ResegmentedLatencyScoringSample:


since we have a segmenter_based_scorer file we can put this there

mgaido91 · 2026-02-27T16:24:50Z

simulstream/metrics/scorers/latency/softsegmenter.py

+                    delay=word.delay,
+                    seq_id=word.seq_id,
+                    elapsed=word.elapsed,
+                    main=main,


rather than main this would be the first?

mgaido91 · 2026-02-27T16:31:27Z

simulstream/metrics/scorers/latency/softsegmenter.py

+            for i in range(len(sample.reference)):
+                new_segmentation[i] = []


this is useless since we have the get at line 466

mgaido91 · 2026-02-27T16:33:07Z

simulstream/metrics/scorers/latency/softsegmenter.py

+                    ideal_delays = [w.delay - ref.start_time for w in segment_words]
+                    ca_delays = [w.elapsed - ref.start_time for w in segment_words]


this is where for coherence with the mwersegmenter and other latency measues we should get rid of the - ref.start_time. We can do this in the YAAL code

mgaido91 · 2026-02-27T16:37:30Z

simulstream/metrics/scorers/latency/mwersegmenter.py

        return True

    @abstractmethod
    def _do_score(self, samples: List[ResegmentedLatencyScoringSample]) -> LatencyScores:


let's move also this to the parent class

mgaido91 · 2026-02-27T16:39:00Z

simulstream/metrics/scorers/latency/segmenter_based_scorer.py

+            index += segment_len
+        assert len(delays) == index, \
+            f"Index {index} should have reached end of delays ({len(delays)})"
+        return segmented_delays


Suggested change

return segmented_delays

return segmented_delays

def _resegment_samples(self, samples: List[LatencyScoringSample]) -> List[ResegmentedLatencyScoringSample]:

...

def score(self, samples: List[LatencyScoringSample]) -> LatencyScores:

resegmented_samples = self._resegment_samples(samples)

return self._do_score(resegmented_samples)

and we can add a comment to the main class that sublcasses should implement _resegment_samples,, like it is done for _do_score. In this way we can isolate in the subclasses the resegmantion part. Thanks.

mgaido91 · 2026-02-27T16:51:37Z

docs/source/modules.rst

   simulstream.metrics.scorers.quality.mwersegmenter
   simulstream.metrics.scorers.latency
   simulstream.metrics.scorers.latency.mwersegmenter
+   simulstream.metrics.scorers.latency.softsegmenter


Suggested change

simulstream.metrics.scorers.latency.softsegmenter

simulstream.metrics.scorers.latency.softsegmenter

simulstream.metrics.scorers.latency.segmenter_based_scorer

pe-trik · 2026-03-01T23:30:30Z

I am closing this PR. I have filed a new PR #24 that implements a unified quality and latency evaluation via OmniSTEval as proposed in https://arxiv.org/abs/2509.17349.

SoftSegmenter and LongYAAL

fa457ea

sarapapi requested changes Feb 17, 2026

View reviewed changes

pe-trik force-pushed the longyaal branch from fa457ea to ed9c978 Compare February 17, 2026 17:10

pe-trik requested a review from sarapapi February 17, 2026 17:11

SoftSegmenter and LongYAAL

2247dbe

pe-trik force-pushed the longyaal branch from ed9c978 to 2247dbe Compare February 18, 2026 10:31

sarapapi requested changes Feb 19, 2026

View reviewed changes

add mosestokenizer dependency and reorder logger definition

3119fa5

pe-trik force-pushed the longyaal branch from 7892931 to 3119fa5 Compare February 20, 2026 09:24

pe-trik requested a review from sarapapi February 23, 2026 10:52

mgaido91 reviewed Feb 23, 2026

View reviewed changes

pe-trik added 2 commits February 26, 2026 16:06

Merge branch 'main', remote-tracking branch 'origin' into longyaal

d2fe10d

pr

096f776

pe-trik requested a review from mgaido91 February 26, 2026 16:24

formatting

23dbc8e

mgaido91 reviewed Feb 27, 2026

View reviewed changes

pe-trik closed this Mar 1, 2026

mgaido91 mentioned this pull request Mar 2, 2026

Implement OmniSTEval evaluation #24

Open

	alignment algorithm that handles long-form audio better. Additionally, LongYAAL is considers
	alignment algorithm that handles long-form audio better. Additionally, LongYAAL considers

		for i in range(len(sample.reference)):
		new_segmentation[i] = []

		ideal_delays = [w.delay - ref.start_time for w in segment_words]
		ca_delays = [w.elapsed - ref.start_time for w in segment_words]

-        return segmented_delays
+        return segmented_delays
+        def _resegment_samples(self, samples: List[LatencyScoringSample]) -> List[ResegmentedLatencyScoringSample]:
+            ...
+        def score(self, samples: List[LatencyScoringSample]) -> LatencyScores:
+            resegmented_samples = self._resegment_samples(samples)
+            return self._do_score(resegmented_samples)

	simulstream.metrics.scorers.latency.softsegmenter
	simulstream.metrics.scorers.latency.softsegmenter
	simulstream.metrics.scorers.latency.segmenter_based_scorer

Conversation

pe-trik commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarapapi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgaido91 commented Feb 25, 2026

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pe-trik commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pe-trik commented Feb 16, 2026 •

edited

Loading