nm-research commited on
Commit
700d80c
·
verified ·
1 Parent(s): b9ebd8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -150,18 +150,21 @@ evalplus.evaluate \
150
 
151
  #### OpenLLM Leaderboard V1 evaluation scores
152
 
153
- | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-FP8-dynamic |
154
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
155
- | ARC-Challenge (Acc-Norm, 25-shot) | | |
156
- | GSM8K (Strict-Match, 5-shot) | | |
157
- | HellaSwag (Acc-Norm, 10-shot) | | |
158
- | MMLU (Acc, 5-shot) | | |
159
- | TruthfulQA (MC2, 0-shot) | | |
160
- | Winogrande (Acc, 5-shot) | | |
161
- | **Average Score** | **** | **** |
162
- | **Recovery** | **100.00** | **** |
163
 
164
  #### HumanEval pass@1 scores
 
 
 
165
 
166
 
167
 
 
150
 
151
  #### OpenLLM Leaderboard V1 evaluation scores
152
 
153
+ | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-quantized.w4a16 |
154
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
155
+ | ARC-Challenge (Acc-Norm, 25-shot) | 66.81 | 66.81 |
156
+ | GSM8K (Strict-Match, 5-shot) | 64.52 | 66.64 |
157
+ | HellaSwag (Acc-Norm, 10-shot) | 84.18 | 84.16 |
158
+ | MMLU (Acc, 5-shot) | 65.52 | 65.36 |
159
+ | TruthfulQA (MC2, 0-shot) | 60.57 | 60.52 |
160
+ | Winogrande (Acc, 5-shot) | 80.19 | 79.95 |
161
+ | **Average Score** | **70.30** | **70.57** |
162
+ | **Recovery** | **100.00** | **100.39** |
163
 
164
  #### HumanEval pass@1 scores
165
+ | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-quantized.w4a16 |
166
+ |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
167
+ | HumanEval Pass@1 | 71.00 | 69.90 |
168
 
169
 
170