ahmad21omar commited on
Commit
e1e81dc
·
1 Parent(s): 2b696a4
Files changed (1) hide show
  1. benchmark_data.csv +19 -17
benchmark_data.csv CHANGED
@@ -1,19 +1,21 @@
1
- Model,Logical Power Ranking,Logical Power Score,Solved Problems,Total Problems,Logic Basic Solved,Logic Easy Solved,Logic Medium Solved,Logic Hard Solved
2
- o4-mini,1,12.3,369,600,140,132,78,19
3
- o1,2,11.9,356,600,138,133,62,23
4
- o3-mini,3,11.6,347,600,146,135,55,11
5
- o1-mini,4,10.1,302,600,145,123,30,4
6
- gemini-2.0-flash-thinking-exp-01-21,5,8.6,258,600,139,97,20,2
7
- DeepSeek-R1-Distill-Llama-70B,6,8.1,242,600,133,92,13,4
8
- gpt-4.5-preview,7,7.2,215,600,142,61,9,3
9
- gpt-4o,8,6.7,202,600,135,56,9,2
10
- Llama-3.3-70B-Instruct,9,5.1,154,600,126,25,2,1
11
- Llama-3.1-8B-Instruct,10,5.0,150,600,123,25,2,0
12
- QwQ-32B-Preview,11,4.6,139,600,115,23,0,1
13
- Internlm2-20b,12,3.9,116,600,106,10,0,0
14
- Qwen2-57B-A14B-Instruct,13,3.9,118,600,107,11,0,0
15
- CodeLlama-34b-Instruct-hf,14,3.5,104,600,102,2,0,0
16
- Mixtral-8x7B-Instruct-v0.1,15,3.1,93,600,91,2,0,0
17
- Llama-3.2-3B-Instruct,16,1.6,48,600,47,1,0,0
 
 
18
 
19
 
 
1
+ Model,Logical Power Ranking,Logical Power Score,Accuracy,Syntax Score,Logic Basic Accuracy,Logic Easy Accuracy,Logic Medium Accuracy,Logic Hard Accuracy
2
+ o3,1,15.4,0.77,0.8,0.99,0.93,0.74,0.43
3
+ o4-mini-high,2,12.8,0.64,0.88,0.98,0.96,0.4,0.21
4
+ o4-mini,3,12.3,0.61,0.86,0.93,0.88,0.52,0.13
5
+ o1,4,11.9,0.59,0.68,0.92,0.89,0.41,0.15
6
+ o3-mini,5,11.6,0.58,0.75,0.97,0.9,0.37,0.07
7
+ o1-mini,6,10.1,0.5,0.95,0.97,0.82,0.2,0.03
8
+ gemini-2.0-flash-thinking-exp-01-21,7,8.6,0.43,0.83,0.93,0.65,0.13,0.01
9
+ DeepSeek-R1-Distill-Llama-70B,8,8.1,0.4,0.57,0.89,0.61,0.09,0.03
10
+ gpt-4.5-preview,9,7.2,0.36,1.0,0.95,0.41,0.06,0.02
11
+ gpt-4o,10,6.7,0.34,0.96,0.9,0.37,0.06,0.01
12
+ Llama-3.3-70B-Instruct,11,5.1,0.26,0.99,0.84,0.17,0.01,0.01
13
+ Llama-3.1-8B-Instruct,12,5.0,0.25,0.87,0.82,0.17,0.01,0.0
14
+ QwQ-32B-Preview,13,4.6,0.23,0.84,0.77,0.15,0.0,0.01
15
+ Internlm2-20b,14,3.9,0.19,0.82,0.71,0.07,0.0,0.0
16
+ Qwen2-57B-A14B-Instruct,15,3.9,0.2,0.81,0.71,0.07,0.0,0.0
17
+ CodeLlama-34b-Instruct-hf,16,3.5,0.17,0.78,0.68,0.01,0.0,0.0
18
+ Mixtral-8x7B-Instruct-v0.1,17,3.1,0.15,0.93,0.61,0.01,0.0,0.0
19
+ Llama-3.2-3B-Instruct,18,1.6,0.08,0.61,0.31,0.01,0.0,0.0
20
 
21