The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Paper • 2303.03915 • Published Mar 7, 2023 • 8
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code Paper • 2206.11249 • Published Jun 22, 2022
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing Paper • 2206.15076 • Published Jun 30, 2022 • 5
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 39
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages Paper • 2411.16508 • Published Nov 25, 2024 • 10
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 17
Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations Paper • 2511.05613 • Published Nov 6, 2025
Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting Paper • 2606.09809 • Published 17 days ago • 4
view article Article Introducing Evaluation Cards: A Live Interpretive Layer for Understanding the AI Evaluations Ecosystem evaleval • 13 days ago • 1