Update README.md
Browse files
README.md
CHANGED
|
@@ -56,20 +56,17 @@ print(generated_text)
|
|
| 56 |
vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
|
| 57 |
|
| 58 |
|
| 59 |
-
## Evaluation
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
| | |strict-match | 5|exact_match|↑ |0.9568|± |0.0056|
|
| 67 |
-
```
|
| 68 |
|
| 69 |
-
-
|
| 70 |
-
|
| 71 |
-
|
|
| 72 |
-
|
| 73 |
-
|
|
| 74 |
-
|
|
| 75 |
-
```
|
|
|
|
| 56 |
vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
|
| 57 |
|
| 58 |
|
| 59 |
+
## Evaluation
|
| 60 |
+
|
| 61 |
+
The model was evaluated on popular reasoning tasks (AIME 2024, MATH-500, GPQA-Diamond) via [LightEval](https://github.com/huggingface/open-r1).
|
| 62 |
+
For reasoning evaluations, we estimate pass@1 based on 10 runs with different seeds.
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
### Accuracy
|
|
|
|
|
|
|
| 66 |
|
| 67 |
+
| | Recovery (%) | deepseek/DeepSeek-R1-0528 | RedHatAI/DeepSeek-R1-0528-quantized.w4a16<br>(this model) |
|
| 68 |
+
| --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
|
| 69 |
+
| AIME 2024<br>pass@1 | 98.50 | 88.66 | 87.33 |
|
| 70 |
+
| MATH-500<br>pass@1 | 99.88 | 97.52 | 97.40 |
|
| 71 |
+
| GPQA Diamond<br>pass@1 | 101.21 | 79.65 | 80.61 |
|
| 72 |
+
| **Reasoning<br>Average Score** | **99.82** | **88.61** | **88.45** |
|
|
|