| Model,Iterations,AutoBench,LMArena,AAI Index,MMLU-Pro,Costs (USD),Avg Answer Duration (sec),P99 Answer Duration (sec),Fail Rate % | |
| Claude-haiku-4.5,330,4.2731,1402,55,76%,0.0188,41.83,165.87,0.30% | |
| Claude-opus-4-1,324,4.2662,1449,59,88%,0.1544,113.96,361.89,2.11% | |
| Claude-sonnet-4.5,326,4.3139,1449,63,88%,0.0406,58.05,186.47,1.51% | |
| DeepSeek-R1-0528,325,4.2340,1395,52,85%,0.0058,107.45,347.58,1.81% | |
| Deepseek-v3.2-exp,328,4.1646,1421,57,85%,0.0012,99.89,383.58,0.91% | |
| Gemini-2.5-flash,329,4.3004,1405,54,84%,0.0142,40.03,154.24,0.60% | |
| Gemini-2.5-flash-lite,330,4.2201,1380,48,81%,0.0014,14.38,68.94,0.30% | |
| Gemini-2.5-pro,329,4.3696,1451,60,86%,0.0484,54.51,169.70,0.60% | |
| Gemini-3-pro-preview,324,4.3859,1495,73,90%,0.0462,45.45,118.64,2.11% | |
| Gemma-3-27b-it,326,3.6967,1364,22,67%,0.0003,36.21,252.24,1.51% | |
| GLM-4.6,327,4.2525,1426,56,83%,0.0083,100.48,361.84,1.21% | |
| Gpt-5,328,4.4526,1437,68,87%,0.0771,151.90,434.09,0.91% | |
| Gpt-5.1,328,4.4855,1454,70,87%,0.0753,129.78,385.94,0.91% | |
| Gpt-5-nano,329,4.3161,1338,49,77%,0.0026,68.67,207.20,0.60% | |
| Gpt-oss-120b,329,4.3651,1352,61,81%,0.0006,17.14,55.04,0.60% | |
| Grok-3-mini,331,4.0764,1410,57,83%,0.0010,19.90,52.79,0.00% | |
| Grok-4.1-fast,327,4.1660,1462,,,0.0007,23.30,65.97,1.21% | |
| Grok-4.1-fast-thinking,329,4.3416,1481,64,85%,0.0019,63.81,223.48,0.60% | |
| Kimi-k2-0905,328,4.2095,1416,50,82%,0.0017,45.14,220.47,0.91% | |
| Kimi-k2-thinking,328,4.3424,1429,67,85%,0.0098,85.92,428.50,0.91% | |
| Llama-3.3-70b-instruct,331,3.5529,1319,28,71%,0.0004,18.39,79.45,0.00% | |
| Llama-3.3-nemotron-super-49b-v1.5,331,4.1026,1340,45,81%,0.0013,44.39,128.06,0.00% | |
| Llama-4-maverick,330,3.6088,1327,36,81%,0.0005,13.00,48.30,0.30% | |
| Magistral-medium-2506,331,3.9490,1305,33,82%,0.0089,17.49,57.38,0.00% | |
| Mistral-small-3.2-24b-instruct,332,3.8118,1354,29,68%,0.0002,14.65,48.67,-0.30% | |
| Nemotron-nano-9b-v2,331,3.6025,,37,74%,0.0004,24.98,105.27,0.00% | |
| Nova-premier-v1,330,3.7007,,25,73%,0.0091,13.16,33.34,0.30% | |
| Nova-pro-v1,330,3.3464,1288,32,69%,0.0017,6.62,21.14,0.30% | |
| Phi-4,331,3.4590,1255,23,71%,0.0001,17.88,47.27,0.00% | |
| Qwen3-235b-a22b-2507,330,4.2354,1374,45,83%,0.0029,43.38,222.55,0.30% | |
| Qwen3-235B-A22B-Thinking-2507,319,4.2774,1397,57,84%,0.0018,135.80,420.72,3.63% | |
| Qwen3-30b-a3b-instruct-2507,331,4.2108,1382,37,78%,0.0005,38.14,131.29,0.00% | |