BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data Paper • 2510.10159 • Published Oct 11, 2025 • 3
Measuring what Matters: Construct Validity in Large Language Model Benchmarks Paper • 2511.04703 • Published Nov 3, 2025 • 8
latam-gpt/Dolci-Instruct-SFT-No-Tools-No-Hardcoded-Sample-200k Viewer • Updated Dec 12, 2025 • 212k • 23
latam-gpt/Dolci-Instruct-SFT-No-Tools-No-Hardcoded-Sample-200k Viewer • Updated Dec 12, 2025 • 212k • 23
latam-gpt/translated-tulu-3-sft-olmo-2-mixture-0225-no-hardcoded Viewer • Updated Dec 11, 2025 • 706k • 28
latam-gpt/translated-tulu-3-sft-olmo-2-mixture-0225-no-hardcoded Viewer • Updated Dec 11, 2025 • 706k • 28
Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings Paper • 2509.14405 • Published Sep 17, 2025 • 2
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans Paper • 2506.22439 • Published May 29, 2025 • 3
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17, 2025 • 16
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America Paper • 2507.00999 • Published Jul 1, 2025 • 1