nm-research commited on
Commit
0d328c8
·
verified ·
1 Parent(s): ba744f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -2
README.md CHANGED
@@ -195,7 +195,7 @@ evalplus.evaluate \
195
 
196
  #### OpenLLM Leaderboard V1 evaluation scores
197
 
198
- | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
199
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
200
  | ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 55.12 |
201
  | GSM8K (Strict-Match, 5-shot) | 60.96 | 60.58 |
@@ -206,8 +206,20 @@ evalplus.evaluate \
206
  | **Average Score** | **61.98** | **61.68** |
207
  | **Recovery** | **100.00** | **99.51** |
208
 
 
 
 
 
 
 
 
 
 
 
 
 
209
  #### HumanEval pass@1 scores
210
- | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
211
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
212
  | HumanEval Pass@1 | 53.40 | 0.549 |
213
 
 
195
 
196
  #### OpenLLM Leaderboard V1 evaluation scores
197
 
198
+ | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w8a8 |
199
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
200
  | ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 55.12 |
201
  | GSM8K (Strict-Match, 5-shot) | 60.96 | 60.58 |
 
206
  | **Average Score** | **61.98** | **61.68** |
207
  | **Recovery** | **100.00** | **99.51** |
208
 
209
+ #### OpenLLM Leaderboard V2 evaluation scores
210
+ | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w8a8 |
211
+ |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
212
+ | IFEval (Inst Level Strict Acc, 0-shot)| 67.99 | 67.03 |
213
+ | BBH (Acc-Norm, 3-shot) | 44.11 | 43.53 |
214
+ | Math-Hard (Exact-Match, 4-shot) | 8.66 | 8.04 |
215
+ | GPQA (Acc-Norm, 0-shot) | 28.30 | 27.60 |
216
+ | MUSR (Acc-Norm, 0-shot) | 35.12 | 34.58 |
217
+ | MMLU-Pro (Acc, 5-shot) | 26.87 | |
218
+ | **Average Score** | **35.17** | **** |
219
+ | **Recovery** | **100.00** | **** |
220
+
221
  #### HumanEval pass@1 scores
222
+ | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w8a8 |
223
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
224
  | HumanEval Pass@1 | 53.40 | 0.549 |
225