Spaces:
Sleeping
Sleeping
Commit ·
2b696a4
1
Parent(s): 642b51c
update table
Browse files- benchmark_data.csv +18 -17
benchmark_data.csv
CHANGED
|
@@ -1,18 +1,19 @@
|
|
| 1 |
-
Model,
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
Llama-3.1-8B-Instruct
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
|
|
|
| 18 |
|
|
|
|
| 1 |
+
Model,Logical Power Ranking,Logical Power Score,Solved Problems,Total Problems,Logic Basic Solved,Logic Easy Solved,Logic Medium Solved,Logic Hard Solved
|
| 2 |
+
o4-mini,1,12.3,369,600,140,132,78,19
|
| 3 |
+
o1,2,11.9,356,600,138,133,62,23
|
| 4 |
+
o3-mini,3,11.6,347,600,146,135,55,11
|
| 5 |
+
o1-mini,4,10.1,302,600,145,123,30,4
|
| 6 |
+
gemini-2.0-flash-thinking-exp-01-21,5,8.6,258,600,139,97,20,2
|
| 7 |
+
DeepSeek-R1-Distill-Llama-70B,6,8.1,242,600,133,92,13,4
|
| 8 |
+
gpt-4.5-preview,7,7.2,215,600,142,61,9,3
|
| 9 |
+
gpt-4o,8,6.7,202,600,135,56,9,2
|
| 10 |
+
Llama-3.3-70B-Instruct,9,5.1,154,600,126,25,2,1
|
| 11 |
+
Llama-3.1-8B-Instruct,10,5.0,150,600,123,25,2,0
|
| 12 |
+
QwQ-32B-Preview,11,4.6,139,600,115,23,0,1
|
| 13 |
+
Internlm2-20b,12,3.9,116,600,106,10,0,0
|
| 14 |
+
Qwen2-57B-A14B-Instruct,13,3.9,118,600,107,11,0,0
|
| 15 |
+
CodeLlama-34b-Instruct-hf,14,3.5,104,600,102,2,0,0
|
| 16 |
+
Mixtral-8x7B-Instruct-v0.1,15,3.1,93,600,91,2,0,0
|
| 17 |
+
Llama-3.2-3B-Instruct,16,1.6,48,600,47,1,0,0
|
| 18 |
+
|
| 19 |
|