add v1 evals
Browse files
README.md
CHANGED
|
@@ -141,14 +141,14 @@ lm_eval \
|
|
| 141 |
|
| 142 |
| Metric | deepseek-ai/DeepSeek-R1-Distill-Llama-8B | neuralmagic-ent/DeepSeek-R1-Distill-Llama-8B-FP8-Dynamic |
|
| 143 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 144 |
-
| ARC-Challenge (Acc-Norm, 25-shot) |
|
| 145 |
-
| GSM8K (Strict-Match, 5-shot) |
|
| 146 |
-
| HellaSwag (Acc-Norm, 10-shot) |
|
| 147 |
-
| MMLU (Acc, 5-shot) |
|
| 148 |
-
| TruthfulQA (MC2, 0-shot) | 50.
|
| 149 |
-
| Winogrande (Acc, 5-shot) | 68.
|
| 150 |
-
| **Average Score** |
|
| 151 |
-
| **Recovery (%)** | **100.00** |
|
| 152 |
|
| 153 |
#### OpenLLM Leaderboard V2 evaluation scores
|
| 154 |
|
|
|
|
| 141 |
|
| 142 |
| Metric | deepseek-ai/DeepSeek-R1-Distill-Llama-8B | neuralmagic-ent/DeepSeek-R1-Distill-Llama-8B-FP8-Dynamic |
|
| 143 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 144 |
+
| ARC-Challenge (Acc-Norm, 25-shot) | 45.05 | 44.88 |
|
| 145 |
+
| GSM8K (Strict-Match, 5-shot) | 62.77 | 61.49 |
|
| 146 |
+
| HellaSwag (Acc-Norm, 10-shot) | 76.78 | 76.68 |
|
| 147 |
+
| MMLU (Acc, 5-shot) | 55.65 | 55.82 |
|
| 148 |
+
| TruthfulQA (MC2, 0-shot) | 50.55 | 49.92 |
|
| 149 |
+
| Winogrande (Acc, 5-shot) | 68.51 | 67.72 |
|
| 150 |
+
| **Average Score** | **59.88** | **59.42** |
|
| 151 |
+
| **Recovery (%)** | **100.00** | **99.22** |
|
| 152 |
|
| 153 |
#### OpenLLM Leaderboard V2 evaluation scores
|
| 154 |
|