ahmad21omar commited on
Commit
2b696a4
·
1 Parent(s): 642b51c

update table

Browse files
Files changed (1) hide show
  1. benchmark_data.csv +18 -17
benchmark_data.csv CHANGED
@@ -1,18 +1,19 @@
1
- Model,Solved_Problems,Total_Problems,Logic Basic Solved,Logic Easy Solved,Logic Medium Solved,Logic Hard Solved
2
- Llama-3.1-8B-Instruct,150,600,123,25,2,0
3
- CodeLlama-34b-Instruct-hf,104,600,102,2,0,0
4
- QwQ-32B-Preview,139,600,115,23,0,1
5
- Internlm2-20b,116,600,106,10,0,0
6
- Llama-3.2-3B-Instruct,48,600,47,1,0,0
7
- Mixtral-8x7B-Instruct-v0.1,93,600,91,2,0,0
8
- Qwen2-57B-A14B-Instruct,118,600,107,11,0,0
9
- Llama-3.3-70B-Instruct,154,600,126,25,2,1
10
- DeepSeek-R1-Distill-Llama-70B,242,600,133,92,13,4
11
- Llama-3.1-8B-Instruct-induction-mixed,251,600,143,85,22,1
12
- o1,356,600,138,133,62,23
13
- o1-mini,302,600,145,123,30,4
14
- o3-mini,347,600,146,135,55,11
15
- gpt-4o,202,600,135,56,9,2
16
- gpt-4.5-preview,215,600,142,61,9,3
17
- o4-mini,369,600,140,132,78,19
 
18
 
 
1
+ Model,Logical Power Ranking,Logical Power Score,Solved Problems,Total Problems,Logic Basic Solved,Logic Easy Solved,Logic Medium Solved,Logic Hard Solved
2
+ o4-mini,1,12.3,369,600,140,132,78,19
3
+ o1,2,11.9,356,600,138,133,62,23
4
+ o3-mini,3,11.6,347,600,146,135,55,11
5
+ o1-mini,4,10.1,302,600,145,123,30,4
6
+ gemini-2.0-flash-thinking-exp-01-21,5,8.6,258,600,139,97,20,2
7
+ DeepSeek-R1-Distill-Llama-70B,6,8.1,242,600,133,92,13,4
8
+ gpt-4.5-preview,7,7.2,215,600,142,61,9,3
9
+ gpt-4o,8,6.7,202,600,135,56,9,2
10
+ Llama-3.3-70B-Instruct,9,5.1,154,600,126,25,2,1
11
+ Llama-3.1-8B-Instruct,10,5.0,150,600,123,25,2,0
12
+ QwQ-32B-Preview,11,4.6,139,600,115,23,0,1
13
+ Internlm2-20b,12,3.9,116,600,106,10,0,0
14
+ Qwen2-57B-A14B-Instruct,13,3.9,118,600,107,11,0,0
15
+ CodeLlama-34b-Instruct-hf,14,3.5,104,600,102,2,0,0
16
+ Mixtral-8x7B-Instruct-v0.1,15,3.1,93,600,91,2,0,0
17
+ Llama-3.2-3B-Instruct,16,1.6,48,600,47,1,0,0
18
+
19