view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 2 days ago • 16
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 2 days ago • 16
Running on CPU Upgrade Featured 1.21k Open ASR Leaderboard 🏆 1.21k Compare and evaluate speech recognition model performance across multiple benchmarks
view article Article Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks +2 Nov 21, 2025 • 25
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published 11 days ago • 25