Update README.md
Browse files
README.md
CHANGED
|
@@ -182,4 +182,24 @@ The model was evaluated on the OpenLLM leaderboard task, using [lm-evaluation-ha
|
|
| 182 |
|
| 183 |
```
|
| 184 |
|
| 185 |
-
</details>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
|
| 183 |
```
|
| 184 |
|
| 185 |
+
</details>
|
| 186 |
+
|
| 187 |
+
|
| 188 |
+
## Accuracy
|
| 189 |
+
|
| 190 |
+
| Category | Metric | Qwen3-Next-80B-A3B-Instruct | Qwen3-Next-80B-A3B-Instruct-FP8-dynamic | Recovery (%) |
|
| 191 |
+
|----------|--------|-------------|-------------------|--------------|
|
| 192 |
+
| OpenLLM V1 | ARC-Challenge (Acc-Norm, 25-shot) | 75.85 | 75.43 | 99.44 |
|
| 193 |
+
| | GSM8K (Strict-Match, 5-shot) | 31.01 | 33.21 | 107.09 |
|
| 194 |
+
| | HellaSwag (Acc-Norm, 10-shot) | 83.25 | 83.22 | 99.96 |
|
| 195 |
+
| | MMLU (Acc, 5-shot) | 85.56 | 85.40 | 99.81 |
|
| 196 |
+
| | TruthfulQA (MC2, 0-shot) | 60.70 | 61.01 | 100.51 |
|
| 197 |
+
| | Winogrande (Acc, 5-shot) | 78.30 | 79.08 | 101.01 |
|
| 198 |
+
| | **Average Score** | **69.11** | **69.56** | **100.65** |
|
| 199 |
+
| OpenLLM V2 | IFEval (Inst Level Strict Acc, 0-shot) | 90.41 | 90.29 | 99.87 |
|
| 200 |
+
| | BBH (Acc-Norm, 3-shot) | 67.78 | 68.04 | 100.38 |
|
| 201 |
+
| | Math-Hard (Exact-Match, 4-shot) | 56.04 | 56.12 | 100.13 |
|
| 202 |
+
| | GPQA (Acc-Norm, 0-shot) | 28.61 | 29.61 | 103.52 |
|
| 203 |
+
| | MUSR (Acc-Norm, 0-shot) | 39.68 | 39.55 | 99.67 |
|
| 204 |
+
| | MMLU-Pro (Acc, 5-shot) | 59.73 | 60.01 | 100.46 |
|
| 205 |
+
| | **Average Score** | **57.04** | **57.27** | **100.40** |
|