Update README.md
Browse files
README.md
CHANGED
|
@@ -206,18 +206,6 @@ evalplus.evaluate \
|
|
| 206 |
| **Average Score** | **57.65** | **57.22** |
|
| 207 |
| **Recovery** | **100.00** | **99.26** |
|
| 208 |
|
| 209 |
-
#### OpenLLM Leaderboard V2 evaluation scores
|
| 210 |
-
| Metric | ibm-granite/granite-3.1-2b-base | neuralmagic-ent/granite-3.1-2b-base-quantized.w8a8 |
|
| 211 |
-
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 212 |
-
| IFEval (Inst Level Strict Acc, 0-shot)| 41.01 | 41.37 |
|
| 213 |
-
| BBH (Acc-Norm, 3-shot) | 40.19 | 39.87 |
|
| 214 |
-
| Math-Hard (Exact-Match, 4-shot) | 4.86 | 3.82 |
|
| 215 |
-
| GPQA (Acc-Norm, 0-shot) | 27.11 | 27.33 |
|
| 216 |
-
| MUSR (Acc-Norm, 0-shot) | 34.85 | 33.67 |
|
| 217 |
-
| MMLU-Pro (Acc, 5-shot) | 22.49 | 22.31 |
|
| 218 |
-
| **Average Score** | **28.42** | **28.06** |
|
| 219 |
-
| **Recovery** | **100.00** | **98.75** |
|
| 220 |
-
|
| 221 |
#### HumanEval pass@1 scores
|
| 222 |
| Metric | ibm-granite/granite-3.1-2b-base | neuralmagic-ent/granite-3.1-2b-base-quantized.w8a8 |
|
| 223 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
|
|
|
| 206 |
| **Average Score** | **57.65** | **57.22** |
|
| 207 |
| **Recovery** | **100.00** | **99.26** |
|
| 208 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 209 |
#### HumanEval pass@1 scores
|
| 210 |
| Metric | ibm-granite/granite-3.1-2b-base | neuralmagic-ent/granite-3.1-2b-base-quantized.w8a8 |
|
| 211 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|