Update README.md
Browse files
README.md
CHANGED
|
@@ -195,7 +195,7 @@ evalplus.evaluate \
|
|
| 195 |
|
| 196 |
#### OpenLLM Leaderboard V1 evaluation scores
|
| 197 |
|
| 198 |
-
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.
|
| 199 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 200 |
| ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 55.12 |
|
| 201 |
| GSM8K (Strict-Match, 5-shot) | 60.96 | 60.58 |
|
|
@@ -206,8 +206,20 @@ evalplus.evaluate \
|
|
| 206 |
| **Average Score** | **61.98** | **61.68** |
|
| 207 |
| **Recovery** | **100.00** | **99.51** |
|
| 208 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 209 |
#### HumanEval pass@1 scores
|
| 210 |
-
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.
|
| 211 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 212 |
| HumanEval Pass@1 | 53.40 | 0.549 |
|
| 213 |
|
|
|
|
| 195 |
|
| 196 |
#### OpenLLM Leaderboard V1 evaluation scores
|
| 197 |
|
| 198 |
+
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w8a8 |
|
| 199 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 200 |
| ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 55.12 |
|
| 201 |
| GSM8K (Strict-Match, 5-shot) | 60.96 | 60.58 |
|
|
|
|
| 206 |
| **Average Score** | **61.98** | **61.68** |
|
| 207 |
| **Recovery** | **100.00** | **99.51** |
|
| 208 |
|
| 209 |
+
#### OpenLLM Leaderboard V2 evaluation scores
|
| 210 |
+
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w8a8 |
|
| 211 |
+
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 212 |
+
| IFEval (Inst Level Strict Acc, 0-shot)| 67.99 | 67.03 |
|
| 213 |
+
| BBH (Acc-Norm, 3-shot) | 44.11 | 43.53 |
|
| 214 |
+
| Math-Hard (Exact-Match, 4-shot) | 8.66 | 8.04 |
|
| 215 |
+
| GPQA (Acc-Norm, 0-shot) | 28.30 | 27.60 |
|
| 216 |
+
| MUSR (Acc-Norm, 0-shot) | 35.12 | 34.58 |
|
| 217 |
+
| MMLU-Pro (Acc, 5-shot) | 26.87 | |
|
| 218 |
+
| **Average Score** | **35.17** | **** |
|
| 219 |
+
| **Recovery** | **100.00** | **** |
|
| 220 |
+
|
| 221 |
#### HumanEval pass@1 scores
|
| 222 |
+
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w8a8 |
|
| 223 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 224 |
| HumanEval Pass@1 | 53.40 | 0.549 |
|
| 225 |
|