Update README.md
Browse files
README.md
CHANGED
|
@@ -23,9 +23,13 @@ Model outputs were generated with the vLLM engine.
|
|
| 23 |
|
| 24 |
For reasoning tasks we estimate pass@1 based on 10 runs with different seeds and `temperature=0.6`, `top_p=0.95` and `max_new_tokens=65536`.
|
| 25 |
|
| 26 |
-
#### Reasoning tasks (AIME-24, GPQA-Diamond, MATH-500)
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Contributors
|
| 31 |
Denis Kuznedelev (Yandex), Eldar Kurtić (Red Hat AI & ISTA), and Dan Alistarh (Red Hat AI & ISTA).
|
|
|
|
| 23 |
|
| 24 |
For reasoning tasks we estimate pass@1 based on 10 runs with different seeds and `temperature=0.6`, `top_p=0.95` and `max_new_tokens=65536`.
|
| 25 |
|
|
|
|
| 26 |
|
| 27 |
+
| | Recovery (%) | deepseek/DeepSeek-R1-0528 | ISTA-DASLab/DeepSeek-R1-0528-GPTQ-4b-128g-experts<br>(this model) |
|
| 28 |
+
| --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
|
| 29 |
+
| AIME 2024<br>pass@1 | 98.50 | 88.66 | 87.33 |
|
| 30 |
+
| MATH-500<br>pass@1 | 99.88 | 97.52 | 97.40 |
|
| 31 |
+
| GPQA Diamond<br>pass@1 | 101.21 | 79.65 | 80.61 |
|
| 32 |
+
| **Reasoning<br>Average Score** | **99.82** | **88.61** | **88.45** |
|
| 33 |
|
| 34 |
## Contributors
|
| 35 |
Denis Kuznedelev (Yandex), Eldar Kurtić (Red Hat AI & ISTA), and Dan Alistarh (Red Hat AI & ISTA).
|