Skip to content

UniDive/corpus-annotation-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

126 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Survey on manual corpus annotation tools

This is a working repository to elaborate on results from survey on manual corpora annotation tools (WG1 Task 1.4)

🛠️ How to Contribute a Tool

We welcome contributions of new tools or updates to existing ones! All tool metadata lives in YAML files under the tools/ directory. The JSON (data/latest_export.json) is generated automatically from those YAMLs through the script src/build_json.py.

1. Add a new tool

  1. Copy the YAML template
    cp tools/_template.yml tools/<tool-slug>.yml
    
    • Use a short, lowercase, hyphenated slug for the filename (e.g., tools/arboratorgrew.yml).
  2. Fill in the fields:
    • Text fields → fill in with a string or leave as "" if unknown.
    • Lists → use YAML lists (- item1, - item2) or [] if none.
    • Yes/No/partial → write exactly "Yes", "No", or "partial".
    • Links → full https://…. For extra resources, use Label: "URL".
  3. Save and commit only this new YAML file.

Tips:

  • Keep descriptions short and clear (one line in Short-description).
  • If unsure, leave the field blank ("") — don’t delete the key.
  • You can check an example file for inspiration, anything in the tools/ folder works

2. Update an existing tool

  1. Open its YAML file in tools/
  2. Edit the fields you want to update.
  3. Do not change the filename unless renaming the tool.

3. Build TSV & JSON locally (optional but recommended)

If you want to check your file before opening a PR:

pip install -r src/requirements.txt
python src/build_json.py

This will regenerate:

  • data/latest_export.json

4. Open a Pull Request

  • Add one YAML file (for a new tool) or edit one (to update).
  • Our CI will automatically run the build and validate your contribution.
  • Reviewers will check the generated TSV/JSON.

Reference

If you use this resource, please cite:

Pannitto, L., Dobrovoljc, K., & Guillaume, B. (2026).
Survey of Tools for Manual Linguistic Annotation: Supporting Diversity through Interactive Exploration.
Proceedings of LREC 2026.

BibTeX:

@inproceedings{pannitto2026survey,
  title = "Survey of Tools for Manual Linguistic Annotation: Supporting Diversity through Interactive Exploration",
  author = "Pannitto, Ludovica  and
    Dobrovoljc, Kaja  and
    Guillaume, Bruno",
  booktitle = "Proceedings of the Fifteenth Language Resources and Evaluation Conference",
  year = "2026",
  publisher = "European Language Resources Association",
}

Video

Project video: Watch on YouTube

Releases

No releases published

Packages

 
 
 

Contributors