nm-research commited on
Commit
c6145eb
·
verified ·
1 Parent(s): 14375d3

Add Leaderboard-v2 evals

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -152,15 +152,16 @@ lm_eval \
152
 
153
  #### OpenLLM Leaderboard V2 evaluation scores
154
 
 
155
  | Metric | deepseek-ai/DeepSeek-R1-Distill-Llama-8B | neuralmagic-ent/DeepSeek-R1-Distill-Llama-8B-FP8-Dynamic |
156
  |---------------------------------------------------------|:---------------------------------:|:-------------------------------------------:|
157
- | IFEval (Inst-and-Prompt Level Strict Acc, 0-shot) | | |
158
- | BBH (Acc-Norm, 3-shot) | | |
159
- | GPQA (Acc-Norm, 0-shot) | | |
160
- | MUSR (Acc-Norm, 0-shot) | | |
161
- | MMLU-Pro (Acc, 5-shot) | | |
162
- | **Average Score** | **** | **** |
163
- | **Recovery (%)** | **100.00** | **** |
164
 
165
  #### Coding evaluation scores
166
 
 
152
 
153
  #### OpenLLM Leaderboard V2 evaluation scores
154
 
155
+
156
  | Metric | deepseek-ai/DeepSeek-R1-Distill-Llama-8B | neuralmagic-ent/DeepSeek-R1-Distill-Llama-8B-FP8-Dynamic |
157
  |---------------------------------------------------------|:---------------------------------:|:-------------------------------------------:|
158
+ | IFEval (Inst-and-Prompt Level Strict Acc, 0-shot) | 38.34 | 38.22 |
159
+ | BBH (Acc-Norm, 3-shot) | 38.19 | 38.32 |
160
+ | GPQA (Acc-Norm, 0-shot) | 28.87 | 27.56 |
161
+ | MUSR (Acc-Norm, 0-shot) | 33.31 | 33.71 |
162
+ | MMLU-Pro (Acc, 5-shot) | 20.10 | 21.39 |
163
+ | **Average Score** | **26.47** | **26.53** |
164
+ | **Recovery (%)** | **100.00** | **100.24** |
165
 
166
  #### Coding evaluation scores
167