Upload contextual.csv
Browse files- contextual.csv +37 -0
contextual.csv
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
排名,模型名称,综合加权得分,常识推理,学科推理,不确定性下的决策推理,道德与伦理推理
|
| 2 |
+
1,Gemini 2.5 Flash,92,98,93,89,87
|
| 3 |
+
2,豆包1.5 Pro(思考模式),91,97,92,88,87
|
| 4 |
+
2,Gemini 2.5 Pro,91,93,94,90,87
|
| 5 |
+
4,Grok 3(思考模式),90,96,88,89,86
|
| 6 |
+
5,GPT-5,89,88,98,88,83
|
| 7 |
+
5,混元-T1,89,97,95,84,81
|
| 8 |
+
5,通义千问3(思考模式),89,96,89,86,85
|
| 9 |
+
5,文心一言 X1-Turbo,89,98,85,86,86
|
| 10 |
+
9,DeepSeek-R1,87,94,93,78,82
|
| 11 |
+
9,通义千问3,87,97,79,87,86
|
| 12 |
+
9,文心一言4.5-Turbo,87,96,76,87,87
|
| 13 |
+
12,混元-TurboS,86,96,79,83,84
|
| 14 |
+
13,豆包1.5 Pro,85,97,81,86,74
|
| 15 |
+
13,GPT-4.1,85,97,70,87,86
|
| 16 |
+
13,GPT-o3,85,90,95,73,80
|
| 17 |
+
13,Grok 3,85,97,69,87,86
|
| 18 |
+
13,Grok 4,85,82,87,82,87
|
| 19 |
+
17,DeepSeek-V3,84,95,81,84,77
|
| 20 |
+
19,GPT-4o,82,98,65,87,78
|
| 21 |
+
19,GPT-o4 mini,82,91,87,72,76
|
| 22 |
+
21,Claude 4 Opus(思考模式),81,96,84,72,71
|
| 23 |
+
21,MiniMax-01,81,96,69,83,75
|
| 24 |
+
21,360智脑2-o1,81,93,76,81,72
|
| 25 |
+
24,Claude 4 Opus,80,95,85,70,70
|
| 26 |
+
24,GLM-4-plus,80,93,71,83,73
|
| 27 |
+
24,Step 2,80,97,63,82,78
|
| 28 |
+
27,Yi- Lightning,79,97,59,82,79
|
| 29 |
+
27,Kimi,79,94,61,79,81
|
| 30 |
+
29,Spark 4.0 Ultra,78,91,71,75,76
|
| 31 |
+
30,日日新 V6 Pro,77,86,58,84,78
|
| 32 |
+
31,GLM-Z1-Air,76,90,76,73,64
|
| 33 |
+
32,Llama 3.3 70B,75,82,52,83,81
|
| 34 |
+
33,日日新 V6推理,74,96,63,68,70
|
| 35 |
+
34,Baichuan4-Turbo,71,91,48,77,69
|
| 36 |
+
35,Step R1-V-Mini,66,96,80,37,51
|
| 37 |
+
36,Kimi-k1.5,66,84,79,42,58
|