| Ranking,Model Name,Overall Weighted Score,Common-sense Reasoning,Discipline-Based Reasoning,Decision-Making Under Uncertainty,Moral & Ethical Reasoning | |
| 1,Gemini 2.5 Flash,92,98,93,89,87 | |
| 2,Doubao 1.5 Pro (Thinking),91,97,92,88,87 | |
| 2,Gemini 2.5 Pro,91,93,94,90,87 | |
| 4,Grok 3 (Thinking),90,96,88,89,86 | |
| 5,GPT-5 (Auto),89,88,98,88,83 | |
| 5,Hunyuan-T1,89,97,95,84,81 | |
| 5,Qwen 3 (Thinking),89,96,89,86,85 | |
| 5,Ernie X1-Turbo,89,98,85,86,86 | |
| 9,DeepSeek-R1,87,94,93,78,82 | |
| 9,Qwen 3,87,97,79,87,86 | |
| 9,Ernie 4.5-Turbo,87,96,76,87,87 | |
| 12,Hunyuan-TurboS,86,96,79,83,84 | |
| 13,Doubao 1.5 Pro,85,97,81,86,74 | |
| 13,GPT-4.1,85,97,70,87,86 | |
| 13,GPT-o3,85,90,95,73,80 | |
| 13,Grok 3,85,97,69,87,86 | |
| 13,Grok 4,85,82,87,82,87 | |
| 17,DeepSeek-V3,84,95,81,84,77 | |
| 19,GPT-4o,82,98,65,87,78 | |
| 19,GPT-o4 mini,82,91,87,72,76 | |
| 21,Claude 4 Opus thinking,81,96,84,72,71 | |
| 21,MiniMax-01,81,96,69,83,75 | |
| 21,360 Zhinao 2-o1,81,93,76,81,72 | |
| 24,Claude 4 Opus,80,95,85,70,70 | |
| 24,GLM-4-plus,80,93,71,83,73 | |
| 24,Step 2,80,97,63,82,78 | |
| 27,Yi- Lightning,79,97,59,82,79 | |
| 27,Kimi,79,94,61,79,81 | |
| 29,Spark 4.0 Ultra,78,91,71,75,76 | |
| 30,SenseChat V6 Pro,77,86,58,84,78 | |
| 31,GLM-Z1-Air,76,90,76,73,64 | |
| 32,Llama 3.3 70B,75,82,52,83,81 | |
| 33,SenseChat V6 (Thinking),74,96,63,68,70 | |
| 34,Baichuan4-Turbo,71,91,48,77,69 | |
| 35,Step R1-V-Mini,66,96,80,37,51 | |
| 36,Kimi-k1.5,66,84,79,42,58 |