Model Bench Leaderboard Evaluating Models Configuration error 4.71k LMArena Leaderboard 🏆 4.71k View the LMArena model performance leaderboard Running on CPU Upgrade 7.03k MTEB Leaderboard 🥇 7.03k Embedding Leaderboard Running 420 Reward Bench Leaderboard 📐 420 Explore and compare LLM reward benchmark scores
Reasoning Datasets Datasets with reasoning traces across various domains released by the community. bespokelabs/Bespoke-Stratos-35k Viewer • Updated Jan 22, 2025 • 35k • 31 • 5 open-thoughts/OpenThoughts-114k Viewer • Updated Aug 31, 2025 • 228k • 74k • 799 open-r1/OpenThoughts-114k-math Viewer • Updated Jan 30, 2025 • 89.1k • 550 • 91 PrimeIntellect/NuminaMath-QwQ-CoT-5M Viewer • Updated Jan 22, 2025 • 5.14M • 652 • 56
Model Bench Leaderboard Evaluating Models Configuration error 4.71k LMArena Leaderboard 🏆 4.71k View the LMArena model performance leaderboard Running on CPU Upgrade 7.03k MTEB Leaderboard 🥇 7.03k Embedding Leaderboard Running 420 Reward Bench Leaderboard 📐 420 Explore and compare LLM reward benchmark scores
Reasoning Datasets Datasets with reasoning traces across various domains released by the community. bespokelabs/Bespoke-Stratos-35k Viewer • Updated Jan 22, 2025 • 35k • 31 • 5 open-thoughts/OpenThoughts-114k Viewer • Updated Aug 31, 2025 • 228k • 74k • 799 open-r1/OpenThoughts-114k-math Viewer • Updated Jan 30, 2025 • 89.1k • 550 • 91 PrimeIntellect/NuminaMath-QwQ-CoT-5M Viewer • Updated Jan 22, 2025 • 5.14M • 652 • 56