alexmarques commited on
Commit
fd5b264
·
verified ·
1 Parent(s): 13d0b1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -182,4 +182,24 @@ The model was evaluated on the OpenLLM leaderboard task, using [lm-evaluation-ha
182
 
183
  ```
184
 
185
- </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
 
183
  ```
184
 
185
+ </details>
186
+
187
+
188
+ ## Accuracy
189
+
190
+ | Category | Metric | Qwen3-Next-80B-A3B-Instruct | Qwen3-Next-80B-A3B-Instruct-FP8-dynamic | Recovery (%) |
191
+ |----------|--------|-------------|-------------------|--------------|
192
+ | OpenLLM V1 | ARC-Challenge (Acc-Norm, 25-shot) | 75.85 | 75.43 | 99.44 |
193
+ | | GSM8K (Strict-Match, 5-shot) | 31.01 | 33.21 | 107.09 |
194
+ | | HellaSwag (Acc-Norm, 10-shot) | 83.25 | 83.22 | 99.96 |
195
+ | | MMLU (Acc, 5-shot) | 85.56 | 85.40 | 99.81 |
196
+ | | TruthfulQA (MC2, 0-shot) | 60.70 | 61.01 | 100.51 |
197
+ | | Winogrande (Acc, 5-shot) | 78.30 | 79.08 | 101.01 |
198
+ | | **Average Score** | **69.11** | **69.56** | **100.65** |
199
+ | OpenLLM V2 | IFEval (Inst Level Strict Acc, 0-shot) | 90.41 | 90.29 | 99.87 |
200
+ | | BBH (Acc-Norm, 3-shot) | 67.78 | 68.04 | 100.38 |
201
+ | | Math-Hard (Exact-Match, 4-shot) | 56.04 | 56.12 | 100.13 |
202
+ | | GPQA (Acc-Norm, 0-shot) | 28.61 | 29.61 | 103.52 |
203
+ | | MUSR (Acc-Norm, 0-shot) | 39.68 | 39.55 | 99.67 |
204
+ | | MMLU-Pro (Acc, 5-shot) | 59.73 | 60.01 | 100.46 |
205
+ | | **Average Score** | **57.04** | **57.27** | **100.40** |