nm-research commited on
Commit
b758978
·
verified ·
1 Parent(s): 5ad1702

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -2
README.md CHANGED
@@ -150,7 +150,7 @@ evalplus.evaluate \
150
 
151
  #### OpenLLM Leaderboard V1 evaluation scores
152
 
153
- | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
154
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
155
  | ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 55.03 |
156
  | GSM8K (Strict-Match, 5-shot) | 60.96 | 61.49 |
@@ -161,8 +161,19 @@ evalplus.evaluate \
161
  | **Average Score** | **61.98** | **61.84** |
162
  | **Recovery** | **100.00** | **99.78** |
163
 
 
 
 
 
 
 
 
 
 
 
 
164
  #### HumanEval pass@1 scores
165
- | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
166
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
167
  | HumanEval Pass@1 | 53.40 | 54.90 |
168
 
 
150
 
151
  #### OpenLLM Leaderboard V1 evaluation scores
152
 
153
+ | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-FP8-dynamic |
154
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
155
  | ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 55.03 |
156
  | GSM8K (Strict-Match, 5-shot) | 60.96 | 61.49 |
 
161
  | **Average Score** | **61.98** | **61.84** |
162
  | **Recovery** | **100.00** | **99.78** |
163
 
164
+ | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-FP8-dynamic |
165
+ |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
166
+ | IFEval (Inst Level Strict Acc, 0-shot)| 67.99 | 66.79 |
167
+ | BBH (Acc-Norm, 3-shot) | 44.11 | 44.24 |
168
+ | Math-Hard (Exact-Match, 4-shot) | 8.66 | 7.89 |
169
+ | GPQA (Acc-Norm, 0-shot) | 28.30 | 26.90 |
170
+ | MUSR (Acc-Norm, 0-shot) | 35.12 | 35.12 |
171
+ | MMLU-Pro (Acc, 5-shot) | 26.87 | 28.33 |
172
+ | **Average Score** | **35.17** | **34.88** |
173
+ | **Recovery** | **100.00** | **** |
174
+
175
  #### HumanEval pass@1 scores
176
+ | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-FP8-dynamic |
177
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
178
  | HumanEval Pass@1 | 53.40 | 54.90 |
179