nm-research commited on
Commit
8203e2f
·
verified ·
1 Parent(s): 96f22b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -12
README.md CHANGED
@@ -206,18 +206,6 @@ evalplus.evaluate \
206
  | **Average Score** | **57.65** | **57.22** |
207
  | **Recovery** | **100.00** | **99.26** |
208
 
209
- #### OpenLLM Leaderboard V2 evaluation scores
210
- | Metric | ibm-granite/granite-3.1-2b-base | neuralmagic-ent/granite-3.1-2b-base-quantized.w8a8 |
211
- |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
212
- | IFEval (Inst Level Strict Acc, 0-shot)| 41.01 | 41.37 |
213
- | BBH (Acc-Norm, 3-shot) | 40.19 | 39.87 |
214
- | Math-Hard (Exact-Match, 4-shot) | 4.86 | 3.82 |
215
- | GPQA (Acc-Norm, 0-shot) | 27.11 | 27.33 |
216
- | MUSR (Acc-Norm, 0-shot) | 34.85 | 33.67 |
217
- | MMLU-Pro (Acc, 5-shot) | 22.49 | 22.31 |
218
- | **Average Score** | **28.42** | **28.06** |
219
- | **Recovery** | **100.00** | **98.75** |
220
-
221
  #### HumanEval pass@1 scores
222
  | Metric | ibm-granite/granite-3.1-2b-base | neuralmagic-ent/granite-3.1-2b-base-quantized.w8a8 |
223
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
 
206
  | **Average Score** | **57.65** | **57.22** |
207
  | **Recovery** | **100.00** | **99.26** |
208
 
 
 
 
 
 
 
 
 
 
 
 
 
209
  #### HumanEval pass@1 scores
210
  | Metric | ibm-granite/granite-3.1-2b-base | neuralmagic-ent/granite-3.1-2b-base-quantized.w8a8 |
211
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|