Add ACHAP task / evaluation metrics by retkowski · Pull Request #37 · hlt-mt/mcif

retkowski · 2026-02-20T20:12:15Z

This PR adds the ACHAP task and its metrics to MCIF. Selected metrics include collar F1 for segmentation quality and BERTScore for title quality. Optionally, we can also get the WER of the transcript. It handles forced alignment automatically using torchaudio. Important: Markdown format is assumed for the structured transcript (i.e., "\n# Title\n" as separator between chapters).

sarapapi

Looks good, just minor changes. Additionally, have you manually tested it?

src/mcif/evaluation.py

retkowski · 2026-02-23T15:56:35Z

Updated the docstring and yes, I also tested the code using a small sample XML.

src/mcif/evaluation.py

retkowski · 2026-02-24T12:01:40Z

@sarapapi Passes the checks now.

sarapapi · 2026-03-10T09:53:43Z

@sarapapi Passes the checks now.

Thanks, there are still comments pending

src/mcif/evaluation.py

sarapapi · 2026-03-13T10:03:56Z

@retkowski could you please address the remaining requested changes by @mgaido91?

retkowski · 2026-03-13T11:34:25Z

@sarapapi I already added a new commit that incorporates requested changes. However, I replied to two of his comments because I think further confirmation/clarification would be good

mgaido91 · 2026-03-13T13:00:53Z

@retkowski I do not see your comments, may you check please?

mgaido91 · 2026-03-13T17:22:50Z

src/mcif/evaluation.py

@@ -1,3 +1,4 @@
+


please remove this line

mgaido91 · 2026-03-13T17:32:21Z

src/mcif/evaluation.py

+    audio_paths = {}
+    if hypo_path is not None:
+        hypo_xml = ET.parse(hypo_path)
+        for task in hypo_xml.getroot().iter("task"):
+            if task.attrib['track'] == track and task.attrib['text_lang'] == language:
+                for s in task.iter("sample"):
+                    audio_paths[s.attrib['id']] = s.find('audio_path').text
+                break


why do we need this? the audio_path is already in the reference. We do not need all this part.

mgaido91 · 2026-03-13T17:35:08Z

src/mcif/evaluation.py

+                    for sid in sample_ids:
+                        if sid in audio_paths:
+                            sample_metadata['audio_path'] = audio_paths[sid]


no need for this. We can just edit line 152 to be:

sample_metadata = {'audio_path': next(sample.iter('reference')).text}

or you can put it here if you prefer, but it is just like this, no need to look at the hypo

mgaido91

LGTM, thanks!

mgaido91

@retkowski I tried this but I am having hard times. I suggested a fix to solve a first problem, but then a second problem arises:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/bin/mcif_eval", line 7, in <module>
    sys.exit(cli_script())
  File "/Users/mgaido/github/mcif/src/mcif/evaluation.py", line 446, in cli_script
    scores = main(
  File "/Users/mgaido/github/mcif/src/mcif/evaluation.py", line 410, in main
    scores.update(score_achap(base_ref_path, hypo, ref, lang))
  File "/Users/mgaido/github/mcif/src/mcif/evaluation.py", line 350, in score_achap
    results = evaluate_batch(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/chunkseg/evaluate.py", line 210, in evaluate_batch
    result = evaluate(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/chunkseg/evaluate.py", line 86, in evaluate
    hyp_timestamps = _timestamps_from_transcript(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/chunkseg/evaluate.py", line 311, in _timestamps_from_transcript
    result = parse_transcript(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/chunkseg/parsers.py", line 406, in parse_transcript
    titles, sections = parse_markdown(text)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/chunkseg/parsers.py", line 228, in parse_markdown
    sections.append(_tokenize_section(section_text))
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/chunkseg/parsers.py", line 121, in _tokenize_section
    return sent_tokenize(text)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
    tokenizer = _get_punkt_tokenizer(language)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
    return PunktTokenizer(language)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
    self.load_lang(lang)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
    lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mcif/lib/python3.10/site-packages/nltk/data.py", line 696, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'punkt_tab' not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')

mgaido91 · 2026-03-16T16:03:59Z

src/mcif/evaluation.py

+
+
+def score_achap(
+        hypo_dict: Dict[str, str],


Suggested change

hypo_dict: Dict[str, str],

base_ref_path: Path,

hypo_dict: Dict[str, str],

mgaido91 · 2026-03-16T16:04:28Z

src/mcif/evaluation.py

+        ref_chapters = json.loads(ref_sample.reference)  # [[title, start_sec], ...]
+        ref_titles = [(t, float(s)) for t, s in ref_chapters]
+        ref_boundaries = [float(s) for _, s in ref_chapters]
+        audio_path = ref_sample.metadata["audio_path"]


Suggested change

audio_path = ref_sample.metadata["audio_path"]

audio_path = base_ref_path / "LONG_AUDIOS" / ref_sample.metadata["audio_path"]

mgaido91 · 2026-03-16T16:04:47Z

src/mcif/evaluation.py

+        ref_titles = [(t, float(s)) for t, s in ref_chapters]
+        ref_boundaries = [float(s) for _, s in ref_chapters]
+        audio_path = ref_sample.metadata["audio_path"]
+        duration = _audio_duration(audio_path)


Suggested change

duration = _audio_duration(audio_path)

duration = _audio_duration(audio_path.absolute().as_posix())

mgaido91 · 2026-03-16T16:05:10Z

src/mcif/evaluation.py

    Main function computing all the scores and returning a Dictionary with the scores
    """
    hypo = read_hypo(hypo_path, track, lang)
    ref = read_reference(ref_path, track, lang, modality=filter_modality)


Suggested change

ref = read_reference(ref_path, track, lang, modality=filter_modality)

base_ref_path = Path(ref_path).parent

mgaido91 · 2026-03-16T16:05:26Z

src/mcif/evaluation.py

            assert "TRANS" in ref.keys()
            scores["TRANS-COMET"] = score_st(hypo, ref, lang)
+        if "ACHAP" in ref.keys():
+            scores.update(score_achap(hypo, ref, lang))


Suggested change

scores.update(score_achap(hypo, ref, lang))

scores.update(score_achap(base_ref_path, hypo, ref, lang))

retkowski · 2026-03-16T17:08:32Z

@retkowski I tried this but I am having hard times. I suggested a fix to solve a first problem, but then a second problem arises:

import nltk
nltk.download('punkt_tab')

I fixed the issue within chunkseg by automatically downloading the punkt_tab sentence tokenizer if not available. Updated the dependency in the requirements.

Not entirely sure about the path issue. In my test, I provided an absolute path directly.

mgaido91 · 2026-03-16T19:44:51Z

I fixed the issue within chunkseg by automatically downloading the punkt_tab sentence tokenizer if not available. Updated the dependency in the requirements.

Cool, thanks!

Not entirely sure about the path issue. In my test, I provided an absolute path directly.

I provided the suggestions on how to fix it. If you just apply them, it will work. We cannot put absolute path in the xml, those have to run everywhere, in any cluster and laptop.

Add ACHAP evaluation metrics

51156fa

sarapapi reviewed Feb 23, 2026

View reviewed changes

src/mcif/evaluation.py Outdated Show resolved Hide resolved

src/mcif/evaluation.py Show resolved Hide resolved

src/mcif/evaluation.py Show resolved Hide resolved

Improve clarity and documentation

8c69149

sarapapi reviewed Feb 24, 2026

View reviewed changes

retkowski added 2 commits February 24, 2026 12:03

More concise, fix linting issues

0424bd6

Fix remaining linting issues

36845cd

sarapapi reviewed Mar 10, 2026

View reviewed changes

src/mcif/evaluation.py Show resolved Hide resolved

mgaido91 requested changes Mar 11, 2026

View reviewed changes

src/mcif/evaluation.py Outdated Show resolved Hide resolved

src/mcif/evaluation.py Show resolved Hide resolved

src/mcif/evaluation.py Show resolved Hide resolved

chunkseg_lang: no default, direct indexing

3cf8154

Inject audio path from source into reference metadata

5e66bce

mgaido91 reviewed Mar 13, 2026

View reviewed changes

Assume audio_path in reference file

42cfbdf

mgaido91 approved these changes Mar 13, 2026

View reviewed changes

mgaido91 reviewed Mar 16, 2026

View reviewed changes

Fix LookupError by updating chunkseg dependency version to 0.3.2 to

e321f3a

	hypo_dict: Dict[str, str],
	base_ref_path: Path,
	hypo_dict: Dict[str, str],

	audio_path = ref_sample.metadata["audio_path"]
	audio_path = base_ref_path / "LONG_AUDIOS" / ref_sample.metadata["audio_path"]

	duration = _audio_duration(audio_path)
	duration = _audio_duration(audio_path.absolute().as_posix())


	ref = read_reference(ref_path, track, lang, modality=filter_modality)
	base_ref_path = Path(ref_path).parent

	scores.update(score_achap(hypo, ref, lang))
	scores.update(score_achap(base_ref_path, hypo, ref, lang))

Conversation

retkowski commented Feb 20, 2026

Uh oh!

sarapapi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

retkowski commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

retkowski commented Feb 24, 2026

Uh oh!

sarapapi commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarapapi commented Mar 13, 2026

Uh oh!

retkowski commented Mar 13, 2026

Uh oh!

mgaido91 commented Mar 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

retkowski commented Mar 16, 2026

Uh oh!

mgaido91 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants