[FEATURE] Update documentation for execution loop and input/output language refactor

## Problem

The problem is that PR #34 (`refactor: generalize dataset indexing from language-based to dataset_id-based`) introduces API-breaking changes to the task interface and results structure, but the documentation in `README.md` and `CONTRIBUTING.md` still references the old API. Once #34 is merged, new contributors following the docs will write code against a stale interface (`load_monolingual_data`, `lang_datasets`, `language_results`, etc.) that no longer exists.

Related: #33, #34

## Proposal

- Type: 
    - [ ] New Ontology (data source for multiple tasks)
    - [ ] New Task(s)
    - [ ] New Model(s)
    - [ ] New Metric(s)
    - [x] Other

- Area(s) of code: `README.md`, `CONTRIBUTING.md`, `examples/custom_task_example.py`

Update all documentation and examples to reflect the new dataset_id-based API introduced in #34. Specifically:

### README.md

1. **Checkpointing section (line ~115)**: Change _"saves result checkpoints after each task completion in a specific language"_ to reflect that checkpointing is now per-dataset (`dataset_id`), not per-language.

2. **Metrics & Aggregation section (lines ~174–181)**: 
   - Step 1 currently says _"Macro-average languages per task"_ — update to reflect the new dataset-based aggregation.
   - Document the new `aggregation_mode` parameter and the three supported modes:
     - `monolingual_only` (default)
     - `crosslingual_group_input_languages`
     - `crosslingual_group_output_languages`
   - Note that `mean_per_language` behavior now depends on the chosen aggregation mode.

3. **Results structure (line ~164)**: The `checkpoint.json` description should mention `datasetid_results` instead of implying language-keyed results.

### CONTRIBUTING.md

4. **"Adding a New Task" — Step 2 code example (lines ~138–206)**: 
   - Rename `load_monolingual_data(self, split, language)` → `load_dataset(self, dataset_id, split)`.
   - Update the `RankingDataset` construction accordingly.
   - Add guidance on the new optional override methods: `languages_to_dataset_ids()` and `get_dataset_language()` (with `input_language`/`output_language` distinction).
   - Briefly explain when a task author would need to override these (multi-dataset per language, cross-lingual, or multilingual tasks).

5. **"Adding a New Task" — Step 4 test example (line ~234)**:
   - Change `task.lang_datasets[Language.EN]` → `task.datasets["en"]` (or `task.datasets[Language.EN.value]` with a named variable for clarity, consistent with the review feedback on #34).

6. **"Adding a New Task" — general**: Add a note or subsection explaining the difference between monolingual, cross-lingual, and multilingual dataset scenarios and how the new `dataset_id` system handles them.

### examples/custom_task_example.py

7. **`load_monolingual_data` method (line 81)**: Rename to `load_dataset` with the new `(self, dataset_id, split)` signature. This file is referenced by both README and CONTRIBUTING as the canonical example.


## Additional Context

- PR #34
- Issue #33 (motivating feature request)
- Key renames introduced by #34:
  | Before (current docs) | After (#34) |
  |---|---|
  | `lang_datasets: dict[Language, Dataset]` | `datasets: dict[str, Dataset]` |
  | `load_monolingual_data(language, split)` | `load_dataset(dataset_id, split)` |
  | `language_results` | `datasetid_results` |
  | _(no aggregation mode)_ | `aggregation_mode` enum: `monolingual_only`, `crosslingual_group_input_languages`, `crosslingual_group_output_languages` |
  | _(single language per dataset)_ | `get_dataset_language(dataset_id)` returns `(input_language, output_language)` |

## Implementation

- [x ] I plan to implement this in a PR
- [] I am proposing the idea and would like someone else to pick it up

Before (current docs)	After (refactor: generalize dataset indexing from language-based to dataset_id-based #34)
`lang_datasets: dict[Language, Dataset]`	`datasets: dict[str, Dataset]`
`load_monolingual_data(language, split)`	`load_dataset(dataset_id, split)`
`language_results`	`datasetid_results`
(no aggregation mode)	`aggregation_mode` enum: `monolingual_only`, `crosslingual_group_input_languages`, `crosslingual_group_output_languages`
(single language per dataset)	`get_dataset_language(dataset_id)` returns `(input_language, output_language)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Update documentation for execution loop and input/output language refactor #43

Problem

Proposal

README.md

CONTRIBUTING.md

examples/custom_task_example.py

Additional Context

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Update documentation for execution loop and input/output language refactor #43

Description

Problem

Proposal

README.md

CONTRIBUTING.md

examples/custom_task_example.py

Additional Context

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions