feat: add new ranking tasks for melo by federetyk · Pull Request #37 · techwolf-ai/workrb

federetyk · 2026-01-16T19:05:57Z

Addresses #30

Description

This PR introduces the MELO Benchmark (Multilingual Entity Linking of Occupations) as a new ranking task for job title normalization into ESCO. MELO provides 42 evaluation datasets spanning 21 languages, built from crosswalks between national occupation taxonomies and ESCO published by official labor-related organizations across EU member states.

Additionally, we include MELS (Multilingual Entity Linking of Skills), a sibling benchmark following the same methodology but targeting skill normalization into ESCO Skills rather than occupations. MELS currently covers 5 languages with 8 datasets, providing complementary evaluation coverage for the skill normalization task group.

This PR is built on top of #34, which introduces a refactor with the generalized dataset indexing infrastructure required for this implementation. As such, this PR is contingent on #34 being merged. If the maintainers prefer a different approach for the refactor, I would be happy to adapt the implementation accordingly.

Changes:

Add MELORanking task class with 42 datasets across 21 languages for job normalization
Add MELSRanking task class with 8 datasets across 5 languages for skill normalization
Extend RankingDataset constructor to support allow_duplicate_targets parameter (required by MELO)
Add unit tests for dataset ID filtering logic with various language combinations
Add defensive check in e2e test to skip tasks with no datasets for the requested language set
Update README with new task entries

Checklist

Added new tests for new functionality
Tested locally with example tasks
Code follows project style guidelines
Documentation updated
No new warnings introduced

…id-based

…egation

…e readme

…y with dataset_id abstraction

…verride

…xing

Introduce DatasetLanguages type and LanguageAggregationMode enum with three modes: monolingual_only, crosslingual_group_input_languages, and crosslingual_group_output_languages. Rename get_dataset_language() to get_dataset_languages(), now returning input and output language sets. Replace MetricsResult.language with input_languages/output_languages. Datasets incompatible with the chosen aggregation mode are skipped rather than causing a failure. BREAKING CHANGE: MetricsResult.language replaced by input_languages/output_languages

…xing

…age enum

Skip datasets incompatible with the chosen LanguageAggregationMode before evaluation to avoid unnecessary compute. Extract get_language_grouping_key() as a shared function in types.py so the same compatibility logic is reused by both the eval-time filter (_filter_pending_work) and the aggregation-time skip in results.py. - Add get_language_grouping_key() to types.py, refactor BenchmarkResults._get_language_grouping_key() to delegate to it - Add ExecutionMode enum (LAZY / ALL) - Add language_aggregation_mode and execution_mode params to evaluate() - Add language_aggregation_mode param to get_summary_metrics() - Export ExecutionMode and LanguageAggregationMode from workrb - Add tests for shared function and MELO-like filtering scenarios

…onments

Conflicts: - README.md - src/workrb/tasks/ranking/__init__.py

The base class interface evolved from get_dataset_language() (singular, returning Language None) to get_dataset_languages() (plural, returning DatasetLanguages with input/output language sets). Update MELORanking and MELSRanking to implement the current interface, reusing the existing _parse_dataset_id() logic. This enables lazy execution filtering and per-language aggregation for these tasks. - Rename LANGUAGE_TO_DATASETS to MELO_DATASET_IDS / MELS_DATASET_IDS - Replace get_dataset_language() with get_dataset_languages() - Add tests covering get_dataset_languages() for all MELO and MELS dataset IDs, including monolingual, cross-lingual, and multilingual corpus cases

federetyk · 2026-02-23T22:37:14Z

This branch is now up-to-date with the latest changes in #34.

@Mattdl

…evaluate() Address PR techwolf-ai#34 review feedback from @Mattdl. The aggregation mode now flows consistently through the entire evaluation and aggregation pipeline instead of being an optional parameter.

…ized-index-melo-ranking-tasks

Override get_dataset_languages() and languages_to_dataset_ids() in the freelancer candidate ranking tasks, replacing the Language.CROSS sentinel with a proper DATASET_LANGUAGES_MAP that describes each dataset's input and output languages. This lets the aggregation and filtering logic handle the multilingual dataset ("cross_lingual" renamed to "multilingual") as a regular entry rather than a special-cased language enum value.

…ized-index-melo-ranking-tasks

Per-task aggregation now groups datasets by language before averaging, giving equal weight to each language regardless of how many datasets it contains. Previously, all compatible datasets were flat-averaged, which over-represented languages with more datasets. Add SKIP_LANGUAGE_AGGREGATION mode for users who want the old flat average with no filtering and no per-language output.

Add four example scripts illustrating the two aggregation modes (language-weighted vs flat average) combined with task-level language filtering (selected subset vs all available languages).

…ized-index-melo-ranking-tasks

Add 6 new MELO dataset IDs for Austria (aut) and Belgium (bel): - aut_q_de_c_de, aut_q_de_c_en - bel_q_fr_c_fr, bel_q_fr_c_en - bel_q_nl_c_nl, bel_q_nl_c_en Update test expectations in TestMELORankingDatasetIds accordingly.

… UK as supported language

…ized-index-melo-ranking-tasks

Mattdl

Left some minor comments (might be outdated already, in which case you can ignore), and just needs to resolve conflicts from latest merge with #34

Then looks ready ready to merge! (: 🚀

src/workrb/tasks/ranking/melo.py

src/workrb/tasks/ranking/mels.py

… naming conventions Addresses PR review feedback requesting richer documentation for users without domain knowledge. Both docstrings now include scope and stats, corpus construction details (surface forms from ESCO concepts), dataset variant descriptions (monolingual vs cross-lingual), the dataset ID naming convention, and concrete examples using real data from the benchmarks.

federetyk · 2026-02-26T14:51:51Z

@Mattdl Thanks for the review! I have expanded the class docstrings for both tasks in the latest commit. The merge conflicts with #34 have already been resolved as well. Let me know if there is anything else needed.

Mattdl

LGTM! Thanks again for the really impactful contributions 🚀

federetyk added 20 commits January 15, 2026 11:26

refactor: generalize dataset indexing from language-based to dataset_…

b00e4c5

…id-based

fix: solve issues in example files

17b1897

fix: add language field to MetricsResult for proper per-language aggr…

e16f8dd

…egation

style: update docstrings to comply with NumPy style

e254bc2

chore: merge upstream changes (v0.3.0, task renames, test refactor)

40810c2

feat: implement a new ranking task for melo

054aef3

feat: implement a new ranking task for mels

fa86104

docs: add new ranking tasks melo and mels to the table of tasks in th…

f6daad2

…e readme

refactor: rename language_results to datasetid_results for consistenc…

71d6d97

…y with dataset_id abstraction

docs: clarify get_dataset_language docstring on purpose and when to o…

647b070

…verride

Merge branch 'techwolf-ai:main' into refactor/generalize-dataset-inde…

1b726ee

…xing

refactor: migrate freelancer project matching tasks to load_dataset API

3a9514d

Merge branch 'techwolf-ai:main' into refactor/generalize-dataset-inde…

e3ccb24

…xing

test: make it explicit that the dataset key "en" comes from the Langu…

72b8e40

…age enum

test: fix lexical baselines regression test to use dataset_id parameter

033db0f

test: fix tolerance for regression test to work well on diverse envir…

f3c5e19

…onments

chore: merge refactor/generalize-dataset-indexing into feat/some-feature

23cd8fd

Conflicts: - README.md - src/workrb/tasks/ranking/__init__.py

federetyk added 9 commits February 24, 2026 00:19

test: suppress UndefinedMetricWarning in toy dataset tests

76149a3

refactor: make language_aggregation_mode a non-optional parameter in …

7d0b8b5

…evaluate() Address PR techwolf-ai#34 review feedback from @Mattdl. The aggregation mode now flows consistently through the entire evaluation and aggregation pipeline instead of being an optional parameter.

Merge branch 'refactor/generalize-dataset-indexing' into feat/general…

bbade90

…ized-index-melo-ranking-tasks

Merge branch 'refactor/generalize-dataset-indexing' into feat/general…

25f2b7c

…ized-index-melo-ranking-tasks

docs: add benchmark example scripts for each aggregation mode

e1dfd9d

Add four example scripts illustrating the two aggregation modes (language-weighted vs flat average) combined with task-level language filtering (selected subset vs all available languages).

Merge branch 'refactor/generalize-dataset-indexing' into feat/general…

e1b61cb

…ized-index-melo-ranking-tasks

docs: add MELO and MELS as tasks for the benchmark example scripts

789020e

federetyk added 3 commits February 25, 2026 21:48

feat: add Austria and Belgium datasets to MELO ranking task

ca0d7d0

Add 6 new MELO dataset IDs for Austria (aut) and Belgium (bel): - aut_q_de_c_de, aut_q_de_c_en - bel_q_fr_c_fr, bel_q_fr_c_en - bel_q_nl_c_nl, bel_q_nl_c_en Update test expectations in TestMELORankingDatasetIds accordingly.

fix: remove from example the dataset that uses ESCO 1.0.5 but defines…

e4a6bce

… UK as supported language

Merge branch 'refactor/generalize-dataset-indexing' into feat/general…

31d1278

…ized-index-melo-ranking-tasks

Mattdl reviewed Feb 26, 2026

View reviewed changes

src/workrb/tasks/ranking/melo.py Outdated Show resolved Hide resolved

src/workrb/tasks/ranking/melo.py Outdated Show resolved Hide resolved

src/workrb/tasks/ranking/mels.py Outdated Show resolved Hide resolved

federetyk added 2 commits February 26, 2026 13:56

Merge branch 'main' into feat/generalized-index-melo-ranking-tasks

4304a50

federetyk requested a review from Mattdl February 26, 2026 14:52

Mattdl approved these changes Feb 26, 2026

View reviewed changes

Mattdl merged commit 5f27df6 into techwolf-ai:main Feb 26, 2026
2 checks passed

federetyk deleted the feat/generalized-index-melo-ranking-tasks branch February 26, 2026 17:45

federetyk mentioned this pull request Feb 26, 2026

[FEATURE] Add MELO Benchmark datasets as a ranking task for job title normalization #30

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add new ranking tasks for melo#37

feat: add new ranking tasks for melo#37
Mattdl merged 34 commits intotechwolf-ai:mainfrom
federetyk:feat/generalized-index-melo-ranking-tasks

federetyk commented Jan 16, 2026

Uh oh!

federetyk commented Feb 23, 2026

Uh oh!

Mattdl left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

federetyk commented Feb 26, 2026

Uh oh!

Mattdl left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

federetyk commented Jan 16, 2026

Description

Checklist

Uh oh!

federetyk commented Feb 23, 2026

Uh oh!

Mattdl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

federetyk commented Feb 26, 2026

Uh oh!

Mattdl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants