Spaces:
Sleeping
Sleeping
Commit ·
e1e81dc
1
Parent(s): 2b696a4
new data
Browse files- benchmark_data.csv +19 -17
benchmark_data.csv
CHANGED
|
@@ -1,19 +1,21 @@
|
|
| 1 |
-
Model,Logical Power Ranking,Logical Power Score,
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
o1
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
|
|
|
|
|
|
| 18 |
|
| 19 |
|
|
|
|
| 1 |
+
Model,Logical Power Ranking,Logical Power Score,Accuracy,Syntax Score,Logic Basic Accuracy,Logic Easy Accuracy,Logic Medium Accuracy,Logic Hard Accuracy
|
| 2 |
+
o3,1,15.4,0.77,0.8,0.99,0.93,0.74,0.43
|
| 3 |
+
o4-mini-high,2,12.8,0.64,0.88,0.98,0.96,0.4,0.21
|
| 4 |
+
o4-mini,3,12.3,0.61,0.86,0.93,0.88,0.52,0.13
|
| 5 |
+
o1,4,11.9,0.59,0.68,0.92,0.89,0.41,0.15
|
| 6 |
+
o3-mini,5,11.6,0.58,0.75,0.97,0.9,0.37,0.07
|
| 7 |
+
o1-mini,6,10.1,0.5,0.95,0.97,0.82,0.2,0.03
|
| 8 |
+
gemini-2.0-flash-thinking-exp-01-21,7,8.6,0.43,0.83,0.93,0.65,0.13,0.01
|
| 9 |
+
DeepSeek-R1-Distill-Llama-70B,8,8.1,0.4,0.57,0.89,0.61,0.09,0.03
|
| 10 |
+
gpt-4.5-preview,9,7.2,0.36,1.0,0.95,0.41,0.06,0.02
|
| 11 |
+
gpt-4o,10,6.7,0.34,0.96,0.9,0.37,0.06,0.01
|
| 12 |
+
Llama-3.3-70B-Instruct,11,5.1,0.26,0.99,0.84,0.17,0.01,0.01
|
| 13 |
+
Llama-3.1-8B-Instruct,12,5.0,0.25,0.87,0.82,0.17,0.01,0.0
|
| 14 |
+
QwQ-32B-Preview,13,4.6,0.23,0.84,0.77,0.15,0.0,0.01
|
| 15 |
+
Internlm2-20b,14,3.9,0.19,0.82,0.71,0.07,0.0,0.0
|
| 16 |
+
Qwen2-57B-A14B-Instruct,15,3.9,0.2,0.81,0.71,0.07,0.0,0.0
|
| 17 |
+
CodeLlama-34b-Instruct-hf,16,3.5,0.17,0.78,0.68,0.01,0.0,0.0
|
| 18 |
+
Mixtral-8x7B-Instruct-v0.1,17,3.1,0.15,0.93,0.61,0.01,0.0,0.0
|
| 19 |
+
Llama-3.2-3B-Instruct,18,1.6,0.08,0.61,0.31,0.01,0.0,0.0
|
| 20 |
|
| 21 |
|