MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation Paper โข 2503.10497 โข Published Mar 13, 2025 โข 3
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants Paper โข 2308.16884 โข Published Aug 31, 2023 โข 11
meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text โข 11B โข Updated Dec 4, 2024 โข 183k โข 1.59k
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects Paper โข 2309.07445 โข Published Sep 14, 2023 โข 1