benchmark-evaluation allenai/ai2_arc Viewer • Updated Dec 21, 2023 • 7.79k • 430k • 335 Rowan/hellaswag Viewer • Updated Jul 10, 2025 • 60k • 308k • 172 ybisk/piqa Updated Jan 18, 2024 • 56.9k • 104 EleutherAI/lambada_openai Viewer • Updated Jul 10, 2025 • 30.9k • 92k • 49
benchmark-evaluation allenai/ai2_arc Viewer • Updated Dec 21, 2023 • 7.79k • 430k • 335 Rowan/hellaswag Viewer • Updated Jul 10, 2025 • 60k • 308k • 172 ybisk/piqa Updated Jan 18, 2024 • 56.9k • 104 EleutherAI/lambada_openai Viewer • Updated Jul 10, 2025 • 30.9k • 92k • 49