BasqueBench

Paper

BasqueBench is a benchmark for evaluating language models in Basque tasks. This is, it evaluates the ability of a language model to understand and generate Basque text. BasqueBench offers a combination of pre-existing, open datasets and datasets developed exclusivelly for this benchmark. All the details of BasqueBench will be published in a paper soon.

The new evaluation datasets included in BasqueBench are:

Task	Category	Homepage
MGSM_eu	Math	https://huggingface.co/datasets/HiTZ/MGSM-eu
PIQA_eu	Question Answering	https://huggingface.co/datasets/HiTZ/PIQA-eu
WNLI_eu	Natural Language Inference	https://huggingface.co/datasets/HiTZ/wnli-eu
XCOPA_eu	Commonsense Reasoning	https://huggingface.co/datasets/HiTZ/XCOPA-eu

The datasets included in BasqueBench that have been made public in previous pubications are:

Task	Category	Paper title	Homepage
Belebele_eu	Reading Comprehension	The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants	https://huggingface.co/datasets/facebook/belebele
EusExams	Question Answering	Latxa: An Open Language Model and Evaluation Suite for Basque	https://huggingface.co/datasets/HiTZ/EusExams
EusProficiency	Question Answering	Latxa: An Open Language Model and Evaluation Suite for Basque	https://huggingface.co/datasets/HiTZ/EusProficiency
EusReading	Reading Comprehension	Latxa: An Open Language Model and Evaluation Suite for Basque	https://huggingface.co/datasets/HiTZ/EusReading
EusTrivia	Question Answering	Latxa: An Open Language Model and Evaluation Suite for Basque	https://huggingface.co/datasets/HiTZ/EusTrivia
FLORES_eu	Translation	No Language Left Behind: Scaling Human-Centered Machine Translation	https://huggingface.co/datasets/facebook/flores
QNLIeu	Natural Language Inference	BasqueGLUE: A Natural Language Understanding Benchmark for Basque	https://huggingface.co/datasets/orai-nlp/basqueGLUE
XNLIeu	Natural Language Inference	XNLIeu: a dataset for cross-lingual NLI in Basque	https://huggingface.co/datasets/HiTZ/xnli-eu
XStoryCloze_eu	Commonsense Reasoning	Few-shot Learning with Multilingual Generative Language Models	https://huggingface.co/datasets/juletxara/xstory_cloze

Citation

Paper for BasqueBench coming soon.

Groups and Tasks

Groups

basque_bench: All tasks included in BasqueBench.
flores_eu: All FLORES translation tasks from or to Basque.

Tasks

The following tasks evaluate tasks on BasqueBench dataset using various scoring methods.

belebele_eus_Latn
eus_exams_eu
eus_proficiency
eus_reading
eus_trivia
flores_eu
flores_eu-ca
flores_eu-de
flores_eu-en
flores_eu-es
flores_eu-fr
flores_eu-gl
flores_eu-it
flores_eu-pt
flores_ca-eu
flores_de-eu
flores_en-eu
flores_es-eu
flores_fr-eu
flores_gl-eu
flores_it-eu
flores_pt-eu
mgsm_direct_eu
mgsm_native_cot_eu
piqa_eu
qnlieu
wnli_eu
xcopa_eu
xnli_eu
xnli_eu_native
xstorycloze_eu

Some of these tasks are taken from benchmarks already available in LM Evaluation Harness. These are:

belebele_eus_Latn: Belebele Basque
qnlieu: From BasqueGLUE

Checklist

Is the task an existing benchmark in the literature?
- Have you referenced the original paper that introduced the task?
- If yes, does the original paper provide a reference implementation?
  - Yes, original implementation contributed by author of the benchmark

If other tasks on this dataset are already supported:

Is the "Main" variant of this task clearly denoted?
Have you provided a short sentence in a README on what each new variant adds / evaluates?
Have you noted which, if any, published evaluation setups are matched by this variant?