nm-research commited on
Commit
429104a
·
verified ·
1 Parent(s): 700d80c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -2
README.md CHANGED
@@ -150,7 +150,7 @@ evalplus.evaluate \
150
 
151
  #### OpenLLM Leaderboard V1 evaluation scores
152
 
153
- | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-quantized.w4a16 |
154
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
155
  | ARC-Challenge (Acc-Norm, 25-shot) | 66.81 | 66.81 |
156
  | GSM8K (Strict-Match, 5-shot) | 64.52 | 66.64 |
@@ -161,8 +161,19 @@ evalplus.evaluate \
161
  | **Average Score** | **70.30** | **70.57** |
162
  | **Recovery** | **100.00** | **100.39** |
163
 
 
 
 
 
 
 
 
 
 
 
 
164
  #### HumanEval pass@1 scores
165
- | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-quantized.w4a16 |
166
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
167
  | HumanEval Pass@1 | 71.00 | 69.90 |
168
 
 
150
 
151
  #### OpenLLM Leaderboard V1 evaluation scores
152
 
153
+ | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-FP8-dynamic |
154
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
155
  | ARC-Challenge (Acc-Norm, 25-shot) | 66.81 | 66.81 |
156
  | GSM8K (Strict-Match, 5-shot) | 64.52 | 66.64 |
 
161
  | **Average Score** | **70.30** | **70.57** |
162
  | **Recovery** | **100.00** | **100.39** |
163
 
164
+ | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-FP8-dynamic |
165
+ |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
166
+ | IFEval (Inst Level Strict Acc, 0-shot)| 74.10 | 73.62 |
167
+ | BBH (Acc-Norm, 3-shot) | 53.19 | 53.26 |
168
+ | Math-Hard (Exact-Match, 4-shot) | 14.77 | 16.79 |
169
+ | GPQA (Acc-Norm, 0-shot) | 31.76 | 32.58 |
170
+ | MUSR (Acc-Norm, 0-shot) | 46.01 | 47.34 |
171
+ | MMLU-Pro (Acc, 5-shot) | 35.81 | 35.72 |
172
+ | **Average Score** | **42.61** | **43.22** |
173
+ | **Recovery** | **100.00** | **101.43** |
174
+
175
  #### HumanEval pass@1 scores
176
+ | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-FP8-dynamic |
177
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
178
  | HumanEval Pass@1 | 71.00 | 69.90 |
179