MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation Paper • 2503.10497 • Published Mar 13, 2025 • 3
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants Paper • 2308.16884 • Published Aug 31, 2023 • 11
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects Paper • 2309.07445 • Published Sep 14, 2023 • 1