nm-research commited on
Commit
14375d3
·
verified ·
1 Parent(s): 105bf8d

add v1 evals

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -141,14 +141,14 @@ lm_eval \
141
 
142
  | Metric | deepseek-ai/DeepSeek-R1-Distill-Llama-8B | neuralmagic-ent/DeepSeek-R1-Distill-Llama-8B-FP8-Dynamic |
143
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
144
- | ARC-Challenge (Acc-Norm, 25-shot) | | |
145
- | GSM8K (Strict-Match, 5-shot) | 83.62 | |
146
- | HellaSwag (Acc-Norm, 10-shot) | 77.09 | 77.13 |
147
- | MMLU (Acc, 5-shot) | | |
148
- | TruthfulQA (MC2, 0-shot) | 50.84 | 50.61 |
149
- | Winogrande (Acc, 5-shot) | 68.03 | 66.93 |
150
- | **Average Score** | **** | **** |
151
- | **Recovery (%)** | **100.00** | **** |
152
 
153
  #### OpenLLM Leaderboard V2 evaluation scores
154
 
 
141
 
142
  | Metric | deepseek-ai/DeepSeek-R1-Distill-Llama-8B | neuralmagic-ent/DeepSeek-R1-Distill-Llama-8B-FP8-Dynamic |
143
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
144
+ | ARC-Challenge (Acc-Norm, 25-shot) | 45.05 | 44.88 |
145
+ | GSM8K (Strict-Match, 5-shot) | 62.77 | 61.49 |
146
+ | HellaSwag (Acc-Norm, 10-shot) | 76.78 | 76.68 |
147
+ | MMLU (Acc, 5-shot) | 55.65 | 55.82 |
148
+ | TruthfulQA (MC2, 0-shot) | 50.55 | 49.92 |
149
+ | Winogrande (Acc, 5-shot) | 68.51 | 67.72 |
150
+ | **Average Score** | **59.88** | **59.42** |
151
+ | **Recovery (%)** | **100.00** | **99.22** |
152
 
153
  #### OpenLLM Leaderboard V2 evaluation scores
154