Spaces:

ahmad21omar
/

SLR-Bench

Sleeping

ahmad21omar commited on May 4, 2025

Commit

e1e81dc

1 Parent(s): 2b696a4

new data

Files changed (1) hide show

benchmark_data.csv CHANGED Viewed

@@ -1,19 +1,21 @@
-Model,Logical Power Ranking,Logical Power Score,Solved Problems,Total Problems,Logic Basic Solved,Logic Easy Solved,Logic Medium Solved,Logic Hard Solved
-o4-mini,1,12.3,369,600,140,132,78,19
-o1,2,11.9,356,600,138,133,62,23
-o3-mini,3,11.6,347,600,146,135,55,11
-o1-mini,4,10.1,302,600,145,123,30,4
-gemini-2.0-flash-thinking-exp-01-21,5,8.6,258,600,139,97,20,2
-DeepSeek-R1-Distill-Llama-70B,6,8.1,242,600,133,92,13,4
-gpt-4.5-preview,7,7.2,215,600,142,61,9,3
-gpt-4o,8,6.7,202,600,135,56,9,2
-Llama-3.3-70B-Instruct,9,5.1,154,600,126,25,2,1
-Llama-3.1-8B-Instruct,10,5.0,150,600,123,25,2,0
-QwQ-32B-Preview,11,4.6,139,600,115,23,0,1
-Internlm2-20b,12,3.9,116,600,106,10,0,0
-Qwen2-57B-A14B-Instruct,13,3.9,118,600,107,11,0,0
-CodeLlama-34b-Instruct-hf,14,3.5,104,600,102,2,0,0
-Mixtral-8x7B-Instruct-v0.1,15,3.1,93,600,91,2,0,0
-Llama-3.2-3B-Instruct,16,1.6,48,600,47,1,0,0

+Model,Logical Power Ranking,Logical Power Score,Accuracy,Syntax Score,Logic Basic Accuracy,Logic Easy Accuracy,Logic Medium Accuracy,Logic Hard Accuracy
+o3,1,15.4,0.77,0.8,0.99,0.93,0.74,0.43
+o4-mini-high,2,12.8,0.64,0.88,0.98,0.96,0.4,0.21
+o4-mini,3,12.3,0.61,0.86,0.93,0.88,0.52,0.13
+o1,4,11.9,0.59,0.68,0.92,0.89,0.41,0.15
+o3-mini,5,11.6,0.58,0.75,0.97,0.9,0.37,0.07
+o1-mini,6,10.1,0.5,0.95,0.97,0.82,0.2,0.03
+gemini-2.0-flash-thinking-exp-01-21,7,8.6,0.43,0.83,0.93,0.65,0.13,0.01
+DeepSeek-R1-Distill-Llama-70B,8,8.1,0.4,0.57,0.89,0.61,0.09,0.03
+gpt-4.5-preview,9,7.2,0.36,1.0,0.95,0.41,0.06,0.02
+gpt-4o,10,6.7,0.34,0.96,0.9,0.37,0.06,0.01
+Llama-3.3-70B-Instruct,11,5.1,0.26,0.99,0.84,0.17,0.01,0.01
+Llama-3.1-8B-Instruct,12,5.0,0.25,0.87,0.82,0.17,0.01,0.0
+QwQ-32B-Preview,13,4.6,0.23,0.84,0.77,0.15,0.0,0.01
+Internlm2-20b,14,3.9,0.19,0.82,0.71,0.07,0.0,0.0
+Qwen2-57B-A14B-Instruct,15,3.9,0.2,0.81,0.71,0.07,0.0,0.0
+CodeLlama-34b-Instruct-hf,16,3.5,0.17,0.78,0.68,0.01,0.0,0.0
+Mixtral-8x7B-Instruct-v0.1,17,3.1,0.15,0.93,0.61,0.01,0.0,0.0
+Llama-3.2-3B-Instruct,18,1.6,0.08,0.61,0.31,0.01,0.0,0.0