Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -54,15 +54,21 @@ vllm serve RedHatAI/Devstral-Small-2507-FP8-Dynamic --tensor-parallel-size 1 --t
|
|
| 54 |
## Evaluation
|
| 55 |
|
| 56 |
The model was evaluated on popular coding tasks (HumanEval, HumanEval+, MBPP, MBPP+) via [EvalPlus](https://github.com/evalplus/evalplus) and vllm backend (v0.10.1.1).
|
| 57 |
-
For evaluations, we run greedy sampling and report pass@1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
|
| 60 |
### Accuracy
|
| 61 |
|
| 62 |
| | Recovery (%) | mistralai/Devstral-Small-2507 | RedHatAI/Devstral-Small-2507-FP8-Dynamic<br>(this model) |
|
| 63 |
| --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
|
| 64 |
-
| HumanEval |
|
| 65 |
-
| HumanEval+ |
|
| 66 |
-
| MBPP |
|
| 67 |
-
| MBPP+ |
|
| 68 |
| **Average Score** | **99.68** | **78.43** | **78.18** |
|
|
|
|
| 54 |
## Evaluation
|
| 55 |
|
| 56 |
The model was evaluated on popular coding tasks (HumanEval, HumanEval+, MBPP, MBPP+) via [EvalPlus](https://github.com/evalplus/evalplus) and vllm backend (v0.10.1.1).
|
| 57 |
+
For evaluations, we run greedy sampling and report pass@1. The command to reproduce evals:
|
| 58 |
+
```bash
|
| 59 |
+
evalplus.evaluate --model "RedHatAI/Devstral-Small-2507-FP8-Dynamic" \
|
| 60 |
+
--dataset [humaneval|mbpp] \
|
| 61 |
+
--base-url http://localhost:8000/v1 \
|
| 62 |
+
--backend openai --greedy
|
| 63 |
+
```
|
| 64 |
|
| 65 |
|
| 66 |
### Accuracy
|
| 67 |
|
| 68 |
| | Recovery (%) | mistralai/Devstral-Small-2507 | RedHatAI/Devstral-Small-2507-FP8-Dynamic<br>(this model) |
|
| 69 |
| --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
|
| 70 |
+
| HumanEval | 100.67 | 89.0 | 89.6 |
|
| 71 |
+
| HumanEval+ | 102.22 | 81.1 | 82.9 |
|
| 72 |
+
| MBPP | 97.29 | 77.5 | 75.4 |
|
| 73 |
+
| MBPP+ | 98.03 | 66.1 | 64.8 |
|
| 74 |
| **Average Score** | **99.68** | **78.43** | **78.18** |
|