New Benchmark table!

#1
Files changed (1) hide show
  1. README.md +8 -42
README.md CHANGED
@@ -103,48 +103,14 @@ Scores sourced from official technical reports (Qwen3 Technical Report, May 2025
103
 
104
  > **Note:** *Benchmarks are Underway for GRaPE 2.1 Flash, they will be empty and set as "TBD" for the time being*
105
 
106
- ### General Knowledge — MMLU (5-shot)
107
-
108
- | Model | Params | MMLU |
109
- | :--- | :--- | :--- |
110
- | **GRaPE 2.1 Flash** | **9B** | **TBD** |
111
- | Qwen3-4B-Instruct | 4B | 83.7\* |
112
- | Qwen3-8B-Instruct | 8B | ~85.0 |
113
- | Qwen2.5-7B-Instruct | 7B | 74.2 |
114
- | Gemma-3-12B | 12B | 73.9 |
115
- | Qwen2.5-14B | 14B | 79.7 |
116
-
117
- ### Mathematics — MATH (4-shot)
118
-
119
- | Model | Params | MATH |
120
- | :--- | :--- | :--- |
121
- | **GRaPE 2.1 Flash** | **9B** | **TBD** |
122
- | Qwen3-4B (Thinking) | 4B | 54.1 |
123
- | Qwen3-8B (Thinking) | 8B | ~65.0 |
124
- | Qwen2.5-7B-Instruct | 7B | 75.5 |
125
- | Qwen2.5-14B | 14B | 55.6 |
126
- | Gemma-3-12B | 12B | 44.4 |
127
-
128
- ### Coding — EvalPlus (avg. HumanEval + MBPP)
129
-
130
- | Model | Params | EvalPlus |
131
- | :--- | :--- | :--- |
132
- | **GRaPE 2.1 Flash** | **9B** | **TBD** |
133
- | Qwen3-4B-Instruct | 4B | 72.1 |
134
- | Qwen3-8B-Instruct | 8B | ~76.0 |
135
- | Qwen2.5-7B-Instruct | 7B | ~65.0 |
136
- | Gemma-3-12B | 12B | 52.7 |
137
- | Qwen2.5-14B | 14B | 60.7 |
138
-
139
- ### Math Word Problems — GSM8K (4-shot)
140
-
141
- | Model | Params | GSM8K |
142
- | :--- | :--- | :--- |
143
- | **GRaPE 2.1 Flash** | **9B** | **TBD** |
144
- | Qwen3-4B (Thinking) | 4B | 87.8 |
145
- | Qwen2.5-7B-Instruct | 7B | 91.1 |
146
- | Qwen2.5-14B | 14B | 90.2 |
147
- | Gemma-3-12B | 12B | 78.0 |
148
 
149
 
150
  ***
 
103
 
104
  > **Note:** *Benchmarks are Underway for GRaPE 2.1 Flash, they will be empty and set as "TBD" for the time being*
105
 
106
+ ### Benchmarks
107
+
108
+ | Models | Params | GPQA Diamond | MMLU-Pro | LiveCodeBench v6 | HMMT Nov 25 | TAU2-Bench | MultiChallenge |
109
+ |----------------------|-------------------|--------------|----------|------------------|-------------|------------|----------------|
110
+ | GRaPE 2.1 Flash | 9B | TBD | TBD | TBD | TBD | TBD | TBD |
111
+ | GRM-2.5-Plus | 9B | 82.7 | 84.2 | 67.2 | 83.2 | 80.5 | 56.5 |
112
+ | Qwen3.5-9B | 9B | 81.7 | 82.5 | 65.6 | 82.9 | 79.1 | 54.5 |
113
+ | google/gemma-4-E4B-it| E4B (4.5B eff.) | 58.6 | 69.4 | 52.0 | -- | 42.2 | -- |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
 
115
 
116
  ***