sh2orc commited on
Commit
827a1a9
·
verified ·
1 Parent(s): 3026d47

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+
4
+ # Qwen3-Coder-30B-A3B-Instruct-FP8-Dynamic Evaluation Results
5
+
6
+ ## Open LLM Leaderboard Benchmark Performance
7
+
8
+ | Benchmark | Score | Std Error |
9
+ |---------|------|---------|
10
+ | ARC Challenge (acc) | 65.36% | ±1.39% |
11
+ | ARC Challenge (acc_norm) | 67.83% | ±1.37% |
12
+ | GSM-8K (flexible-extract) | 89.84% | ±0.83% |
13
+ | GSM-8K (strict-match) | 88.93% | ±0.86% |
14
+ | Hellaswag (acc) | 50.25% | ±0.50% |
15
+ | Hellaswag (acc_norm) | 58.75% | ±0.49% |
16
+ | MMLU (overall) | 78.07% | ±0.33% |
17
+ | TruthfulQA MC1 | 38.31% | ±1.70% |
18
+ | TruthfulQA MC2 | 59.00% | ±1.64% |
19
+ | Winogrande | 62.75% | ±1.36% |
20
+
21
+ ## MMLU Category Performance
22
+
23
+ | Category | Score | Std Error |
24
+ |---------|------|---------|
25
+ | Humanities | 69.93% | ±0.64% |
26
+ | Other | 79.95% | ±0.69% |
27
+ | Social Sciences | 86.22% | ±0.61% |
28
+ | STEM | 80.40% | ±0.69% |
29
+
30
+ ## Humanities Subcategory Performance
31
+
32
+ | Subject | Score | Std Error |
33
+ |------|------|---------|
34
+ | Formal Logic | 61.90% | ±4.34% |
35
+ | High School European History | 86.67% | ±2.65% |
36
+ | High School US History | 86.76% | ±2.38% |
37
+ | High School World History | 89.03% | ±2.03% |
38
+ | International Law | 82.64% | ±3.46% |
39
+ | Jurisprudence | 82.41% | ±3.68% |
40
+ | Logical Fallacies | 82.21% | ±3.00% |
41
+ | Moral Disputes | 74.86% | ±2.34% |
42
+ | Moral Scenarios | 67.15% | ±1.57% |
43
+ | Philosophy | 77.17% | ±2.38% |
44
+ | Prehistory | 83.02% | ±2.09% |
45
+ | Professional Law | 54.95% | ±1.27% |
46
+ | World Religions | 85.38% | ±2.71% |
47
+
48
+ ## Social Sciences Subcategory Performance
49
+
50
+ | Subject | Score | Std Error |
51
+ |------|------|---------|
52
+ | Econometrics | 72.81% | ±4.19% |
53
+ | High School Geography | 90.40% | ±2.10% |
54
+ | High School Government and Politics | 95.34% | ±1.52% |
55
+ | High School Macroeconomics | 85.64% | ±1.78% |
56
+ | High School Microeconomics | 91.60% | ±1.80% |
57
+ | High School Psychology | 93.94% | ±1.02% |
58
+ | Human Sexuality | 86.26% | ±3.02% |
59
+ | Professional Psychology | 81.70% | ±1.56% |
60
+ | Public Relations | 74.55% | ±4.17% |
61
+ | Security Studies | 75.51% | ±2.75% |
62
+ | Sociology | 85.57% | ±2.48% |
63
+ | US Foreign Policy | 91.00% | ±2.88% |
64
+
65
+ ## STEM Subcategory Performance
66
+
67
+ | Subject | Score | Std Error |
68
+ |------|------|---------|
69
+ | Abstract Algebra | 72.00% | ±4.51% |
70
+ | Anatomy | 73.33% | ±3.82% |
71
+ | Astronomy | 90.13% | ±2.43% |
72
+ | College Biology | 89.58% | ±2.55% |
73
+ | College Chemistry | 62.00% | ±4.88% |
74
+ | College Computer Science | 79.00% | ±4.09% |
75
+ | College Mathematics | 64.00% | ±4.82% |
76
+ | College Physics | 72.55% | ±4.44% |
77
+ | Computer Security | 86.00% | ±3.49% |
78
+ | Conceptual Physics | 90.21% | ±1.94% |
79
+ | Electrical Engineering | 80.69% | ±3.29% |
80
+ | Elementary Mathematics | 85.45% | ±1.82% |
81
+ | High School Biology | 92.90% | ±1.46% |
82
+ | High School Chemistry | 77.83% | ±2.92% |
83
+ | High School Computer Science | 86.00% | ±3.49% |
84
+ | High School Mathematics | 66.67% | ±2.87% |
85
+ | High School Physics | 75.50% | ±3.51% |
86
+ | High School Statistics | 78.70% | ±2.79% |
87
+ | Machine Learning | 75.89% | ±4.06% |
88
+
89
+ ## Other Subcategory Performance
90
+
91
+ | Subject | Score | Std Error |
92
+ |------|------|---------|
93
+ | Business Ethics | 80.00% | ±4.02% |
94
+ | Clinical Knowledge | 83.40% | ±2.29% |
95
+ | College Medicine | 78.03% | ±3.16% |
96
+ | Global Facts | 51.00% | ±5.02% |
97
+ | Human Aging | 79.37% | ±2.72% |
98
+ | Management | 85.44% | ±3.49% |
99
+ | Marketing | 89.74% | ±1.99% |
100
+ | Medical Genetics | 88.00% | ±3.27% |
101
+ | Miscellaneous | 88.25% | ±1.15% |
102
+ | Nutrition | 80.72% | ±2.26% |
103
+ | Professional Accounting | 64.89% | ±2.85% |
104
+ | Professional Medicine | 84.19% | ±2.22% |
105
+ | Virology | 50.60% | ±3.89% |