COLE: Comprehensive Benchmark for Quebec French Language Understanding Evaluation

title	COLE !
emoji	🐳
colorFrom	purple
colorTo	gray
sdk	docker
app_port	7860

COLE: Comprehensive Benchmark for Quebec French Language Understanding Evaluation

COLE is a comprehensive benchmark for evaluating Quebec French Natural Language Understanding (NLU). It includes 23 diverse tasks covering sentiment analysis, paraphrase detection, natural language inference, question answering, grammatical judgment, word sense disambiguation, and more — with a particular focus on linguistic phenomena relevant to the French language.

We benchmark 94 large language models (LLMs), providing an extensive analysis of the current state of Quebec French NLU. Our results highlight a significant performance gap between closed- and open-weight models and identify key challenging frontiers such as zero-shot extractive question answering, fine-grained word sense disambiguation, and understanding of regional language variations.

Links

Leaderboard: colebenchmark.org
Paper: COLE: a Comprehensive Benchmark for Quebec French Language Understanding Evaluation (arXiv:2510.05046)
Dataset: HuggingFace — graalul/COLE-public

Tasks

COLE consists of 23 tasks grouped by NLU capability:

Sentiment Analysis

Task	Description	Test size
Allocine	Sentiment classification of French movie reviews (positive/negative)	20,000
MMS-fr	Sentiment analysis with 3 classes (positive, neutral, negative)	63,190

Natural Language Inference (NLI)

Task	Description	Test size
FraCaS	NLI involving quantifiers, plurality, anaphora, and ellipsis	346
GQNLI-fr	NLI with quantifier logic (e.g., most, at least, more than half)	30
LingNLI	NLI corpus constructed with a linguist in the loop	4,893
MNLI-nineeleven-Fr-MT	French machine-translated MNLI using 9/11 context	2,000
RTE3-Fr	French version of RTE3 for textual entailment	3,121
SICK-fr	Sentence pair relatedness and entailment	4,906
XNLI-fr	Cross-lingual NLI in French	5,010

Question Answering

Task	Description	Test size
FQuAD	Extractive QA on high-quality French Wikipedia articles	400
Fr-BoolQ	Boolean question answering in French	178
PIAF	French extractive QA pairs	384

Paraphrase Detection

Task	Description	Test size
PAWS-X	Paraphrase identification from sentence pairs	2,000
QFrBLiMP	Semantic equivalence detection between sentence pairs	2,290

Grammatical Judgment

Task	Description	Test size
DACCORD	Semantic plausibility of French sentences (binary)	1,034
MultiBLiMP-Fr	Grammatical correctness from minimal pairs	77
QFrCoLA	Sentence acceptability in French (grammar, syntax)	7,546

Semantic Similarity

Task	Description	Test size
STS22	Document-level similarity of multilingual news articles	72

Word Sense Disambiguation

Task	Description	Test size
WSD-Fr	Disambiguating verb meanings in context	3,121

Quebec French

Task	Description	Test size
QFrCoRE	Matching Quebec French expressions to standard definitions	4,633
QFrCoRT	Matching Quebec French terms to standard definitions	201

Coreference / Pronoun Resolution

Task	Description	Test size
Wino-X-LM	Pronoun resolution with ambiguous referents	2,793
Wino-X-MT	Translation-based pronoun resolution with gendered pronouns	2,988

Language

All data in COLE is in French.

Citation

If you use COLE in your research, please cite our paper:

@article{beauchemin2025cole,
  title={COLE: a Comprehensive Benchmark for Quebec French Language Understanding Evaluation},
  author={Beauchemin, David and Tremblay, Yan and Youssef, Mohamed Amine and Khoury, Richard},
  journal={arXiv preprint arXiv:2510.05046},
  year={2025},
  url={https://arxiv.org/abs/2510.05046}
}

Name		Name	Last commit message	Last commit date
Latest commit History 667 Commits
.github		.github
.idea		.idea
docs		docs
frontend		frontend
predictions		predictions
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
nginx.conf		nginx.conf
pyproject.toml		pyproject.toml
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COLE: Comprehensive Benchmark for Quebec French Language Understanding Evaluation

Links

Tasks

Sentiment Analysis

Natural Language Inference (NLI)

Question Answering

Paraphrase Detection

Grammatical Judgment

Semantic Similarity

Word Sense Disambiguation

Quebec French

Coreference / Pronoun Resolution

Language

Citation

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

COLE: Comprehensive Benchmark for Quebec French Language Understanding Evaluation

Links

Tasks

Sentiment Analysis

Natural Language Inference (NLI)

Question Answering

Paraphrase Detection

Grammatical Judgment

Semantic Similarity

Word Sense Disambiguation

Quebec French

Coreference / Pronoun Resolution

Language

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages