krishnateja95 commited on
Commit
ef90010
·
verified ·
1 Parent(s): 3022aba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -0
README.md CHANGED
@@ -107,3 +107,154 @@ processor.save_pretrained(SAVE_DIR)
107
 
108
  The model was evaluated on the RULER and long-context benchmarks (LongBench), using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
109
  [vLLM](https://docs.vllm.ai/en/stable/) was used for all evaluations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
  The model was evaluated on the RULER and long-context benchmarks (LongBench), using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
109
  [vLLM](https://docs.vllm.ai/en/stable/) was used for all evaluations.
110
+
111
+
112
+
113
+
114
+
115
+
116
+ ### Accuracy
117
+ <table>
118
+ <thead>
119
+ <tr>
120
+ <th>Category</th>
121
+ <th>Metric</th>
122
+ <th>Qwen/Qwen3-8B</th>
123
+ <th>nm-testing/Qwen3-8B-FP8-block</th>
124
+ <th>Recovery (%)</th>
125
+ </tr>
126
+ </thead>
127
+ <tbody>
128
+ <!-- OpenLLM Leaderboard V1 -->
129
+ <tr>
130
+ <td rowspan="7"><b>OpenLLM V1</b></td>
131
+ <td>ARC-Challenge (Acc-Norm, 25-shot)</td>
132
+ <td>67.66</td>
133
+ <td>67.92</td>
134
+ <td>100.38</td>
135
+ </tr>
136
+ <tr>
137
+ <td>GSM8K (Strict-Match, 5-shot)</td>
138
+ <td>87.95</td>
139
+ <td>87.79</td>
140
+ <td>99.83</td>
141
+ </tr>
142
+ <tr>
143
+ <td>HellaSwag (Acc-Norm, 10-shot)</td>
144
+ <td>76.78</td>
145
+ <td>76.60</td>
146
+ <td>99.77</td>
147
+ </tr>
148
+ <tr>
149
+ <td>MMLU (Acc, 5-shot)</td>
150
+ <td>74.88</td>
151
+ <td>74.70</td>
152
+ <td>99.75</td>
153
+ </tr>
154
+ <tr>
155
+ <td>TruthfulQA (MC2, 0-shot)</td>
156
+ <td>54.36</td>
157
+ <td>54.27</td>
158
+ <td>99.85</td>
159
+ </tr>
160
+ <tr>
161
+ <td>Winogrande (Acc, 5-shot)</td>
162
+ <td>71.11</td>
163
+ <td>71.43</td>
164
+ <td>100.44</td>
165
+ </tr>
166
+ <tr>
167
+ <td><b>Average Score</b></td>
168
+ <td><b>72.12</b></td>
169
+ <td><b>72.12</b></td>
170
+ <td><b>100.00</b></td>
171
+ </tr>
172
+ <!-- OpenLLM Leaderboard V2 -->
173
+ <tr>
174
+ <td rowspan="7"><b>OpenLLM V2</b></td>
175
+ <td>IFEval (Inst Level Strict Acc, 0-shot)</td>
176
+ <td>48.56</td>
177
+ <td>48.80</td>
178
+ <td>100.49</td>
179
+ </tr>
180
+ <tr>
181
+ <td>BBH (Acc-Norm, 3-shot)</td>
182
+ <td>29.23</td>
183
+ <td>29.32</td>
184
+ <td>100.30</td>
185
+ </tr>
186
+ <tr>
187
+ <td>Math-Hard (Exact-Match, 4-shot)</td>
188
+ <td>17.82</td>
189
+ <td>18.05</td>
190
+ <td>101.27</td>
191
+ </tr>
192
+ <tr>
193
+ <td>GPQA (Acc-Norm, 0-shot)</td>
194
+ <td>25.76</td>
195
+ <td>26.09</td>
196
+ <td>101.30</td>
197
+ </tr>
198
+ <tr>
199
+ <td>MUSR (Acc-Norm, 0-shot)</td>
200
+ <td>41.01</td>
201
+ <td>41.14</td>
202
+ <td>100.32</td>
203
+ </tr>
204
+ <tr>
205
+ <td>MMLU-Pro (Acc, 5-shot)</td>
206
+ <td>11.32</td>
207
+ <td>11.33</td>
208
+ <td>100.07</td>
209
+ </tr>
210
+ <tr>
211
+ <td><b>Average Score</b></td>
212
+ <td><b>28.95</b></td>
213
+ <td><b>29.12</b></td>
214
+ <td><b>100.59</b></td>
215
+ </tr>
216
+
217
+ <td rowspan="4" ><strong>Coding</strong>
218
+ </td>
219
+ <td>HumanEval pass@1
220
+ </td>
221
+ <td>84.80
222
+ </td>
223
+ <td>85.40
224
+ </td>
225
+ <td>100.71
226
+ </td>
227
+ </tr>
228
+ <tr>
229
+ <td>HumanEval+ pass@1
230
+ </td>
231
+ <td>78.70
232
+ </td>
233
+ <td>79.90
234
+ </td>
235
+ <td>101.52
236
+ </td>
237
+ </tr>
238
+ <tr>
239
+ <td>MBPP pass@1
240
+ </td>
241
+ <td>72.80
242
+ </td>
243
+ <td>73.50
244
+ </td>
245
+ <td>100.96
246
+ </td>
247
+ </tr>
248
+ <tr>
249
+ <td>MBPP+ pass@1
250
+ </td>
251
+ <td>62.70
252
+ </td>
253
+ <td>64.80
254
+ </td>
255
+ <td>103.35
256
+ </td>
257
+ </tr>
258
+
259
+ /tbody>
260
+ </table>