Problem
The problem is that PR #34 (refactor: generalize dataset indexing from language-based to dataset_id-based) introduces API-breaking changes to the task interface and results structure, but the documentation in README.md and CONTRIBUTING.md still references the old API. Once #34 is merged, new contributors following the docs will write code against a stale interface (load_monolingual_data, lang_datasets, language_results, etc.) that no longer exists.
Related: #33, #34
Proposal
Update all documentation and examples to reflect the new dataset_id-based API introduced in #34. Specifically:
README.md
-
Checkpointing section (line ~115): Change "saves result checkpoints after each task completion in a specific language" to reflect that checkpointing is now per-dataset (dataset_id), not per-language.
-
Metrics & Aggregation section (lines ~174–181):
- Step 1 currently says "Macro-average languages per task" — update to reflect the new dataset-based aggregation.
- Document the new
aggregation_mode parameter and the three supported modes:
monolingual_only (default)
crosslingual_group_input_languages
crosslingual_group_output_languages
- Note that
mean_per_language behavior now depends on the chosen aggregation mode.
-
Results structure (line ~164): The checkpoint.json description should mention datasetid_results instead of implying language-keyed results.
CONTRIBUTING.md
-
"Adding a New Task" — Step 2 code example (lines ~138–206):
- Rename
load_monolingual_data(self, split, language) → load_dataset(self, dataset_id, split).
- Update the
RankingDataset construction accordingly.
- Add guidance on the new optional override methods:
languages_to_dataset_ids() and get_dataset_language() (with input_language/output_language distinction).
- Briefly explain when a task author would need to override these (multi-dataset per language, cross-lingual, or multilingual tasks).
-
"Adding a New Task" — Step 4 test example (line ~234):
-
"Adding a New Task" — general: Add a note or subsection explaining the difference between monolingual, cross-lingual, and multilingual dataset scenarios and how the new dataset_id system handles them.
examples/custom_task_example.py
load_monolingual_data method (line 81): Rename to load_dataset with the new (self, dataset_id, split) signature. This file is referenced by both README and CONTRIBUTING as the canonical example.
Additional Context
Implementation
- [x ] I plan to implement this in a PR
- [] I am proposing the idea and would like someone else to pick it up
Problem
The problem is that PR #34 (
refactor: generalize dataset indexing from language-based to dataset_id-based) introduces API-breaking changes to the task interface and results structure, but the documentation inREADME.mdandCONTRIBUTING.mdstill references the old API. Once #34 is merged, new contributors following the docs will write code against a stale interface (load_monolingual_data,lang_datasets,language_results, etc.) that no longer exists.Related: #33, #34
Proposal
Type:
Area(s) of code:
README.md,CONTRIBUTING.md,examples/custom_task_example.pyUpdate all documentation and examples to reflect the new dataset_id-based API introduced in #34. Specifically:
README.md
Checkpointing section (line ~115): Change "saves result checkpoints after each task completion in a specific language" to reflect that checkpointing is now per-dataset (
dataset_id), not per-language.Metrics & Aggregation section (lines ~174–181):
aggregation_modeparameter and the three supported modes:monolingual_only(default)crosslingual_group_input_languagescrosslingual_group_output_languagesmean_per_languagebehavior now depends on the chosen aggregation mode.Results structure (line ~164): The
checkpoint.jsondescription should mentiondatasetid_resultsinstead of implying language-keyed results.CONTRIBUTING.md
"Adding a New Task" — Step 2 code example (lines ~138–206):
load_monolingual_data(self, split, language)→load_dataset(self, dataset_id, split).RankingDatasetconstruction accordingly.languages_to_dataset_ids()andget_dataset_language()(withinput_language/output_languagedistinction)."Adding a New Task" — Step 4 test example (line ~234):
task.lang_datasets[Language.EN]→task.datasets["en"](ortask.datasets[Language.EN.value]with a named variable for clarity, consistent with the review feedback on refactor: generalize dataset indexing from language-based to dataset_id-based #34)."Adding a New Task" — general: Add a note or subsection explaining the difference between monolingual, cross-lingual, and multilingual dataset scenarios and how the new
dataset_idsystem handles them.examples/custom_task_example.py
load_monolingual_datamethod (line 81): Rename toload_datasetwith the new(self, dataset_id, split)signature. This file is referenced by both README and CONTRIBUTING as the canonical example.Additional Context
lang_datasets: dict[Language, Dataset]datasets: dict[str, Dataset]load_monolingual_data(language, split)load_dataset(dataset_id, split)language_resultsdatasetid_resultsaggregation_modeenum:monolingual_only,crosslingual_group_input_languages,crosslingual_group_output_languagesget_dataset_language(dataset_id)returns(input_language, output_language)Implementation