AlexCuadron/SWE-Bench-Verified-O1-native-tool-calling-reasoning-high-results Viewer ⢠Updated Jan 14 ⢠500 ⢠1.85k ⢠3
Running on CPU Upgrade 13.8k Open LLM Leaderboard š 13.8k Track, rank and evaluate open LLMs and chatbots