Runtime error Agents 20 Rabbits Leaderboard 💊 20 Visualize and analyze language model robustness to drug name synonyms
Running on CPU Upgrade Agents 246 MMLU-Pro Leaderboard 🥇 246 More advanced and challenging multi-task evaluation