Update README.md
Browse files
README.md
CHANGED
|
@@ -150,7 +150,7 @@ evalplus.evaluate \
|
|
| 150 |
|
| 151 |
#### OpenLLM Leaderboard V1 evaluation scores
|
| 152 |
|
| 153 |
-
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-
|
| 154 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 155 |
| ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 55.03 |
|
| 156 |
| GSM8K (Strict-Match, 5-shot) | 60.96 | 61.49 |
|
|
@@ -161,8 +161,19 @@ evalplus.evaluate \
|
|
| 161 |
| **Average Score** | **61.98** | **61.84** |
|
| 162 |
| **Recovery** | **100.00** | **99.78** |
|
| 163 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
#### HumanEval pass@1 scores
|
| 165 |
-
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-
|
| 166 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 167 |
| HumanEval Pass@1 | 53.40 | 54.90 |
|
| 168 |
|
|
|
|
| 150 |
|
| 151 |
#### OpenLLM Leaderboard V1 evaluation scores
|
| 152 |
|
| 153 |
+
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-FP8-dynamic |
|
| 154 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 155 |
| ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 55.03 |
|
| 156 |
| GSM8K (Strict-Match, 5-shot) | 60.96 | 61.49 |
|
|
|
|
| 161 |
| **Average Score** | **61.98** | **61.84** |
|
| 162 |
| **Recovery** | **100.00** | **99.78** |
|
| 163 |
|
| 164 |
+
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-FP8-dynamic |
|
| 165 |
+
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 166 |
+
| IFEval (Inst Level Strict Acc, 0-shot)| 67.99 | 66.79 |
|
| 167 |
+
| BBH (Acc-Norm, 3-shot) | 44.11 | 44.24 |
|
| 168 |
+
| Math-Hard (Exact-Match, 4-shot) | 8.66 | 7.89 |
|
| 169 |
+
| GPQA (Acc-Norm, 0-shot) | 28.30 | 26.90 |
|
| 170 |
+
| MUSR (Acc-Norm, 0-shot) | 35.12 | 35.12 |
|
| 171 |
+
| MMLU-Pro (Acc, 5-shot) | 26.87 | 28.33 |
|
| 172 |
+
| **Average Score** | **35.17** | **34.88** |
|
| 173 |
+
| **Recovery** | **100.00** | **** |
|
| 174 |
+
|
| 175 |
#### HumanEval pass@1 scores
|
| 176 |
+
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-FP8-dynamic |
|
| 177 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 178 |
| HumanEval Pass@1 | 53.40 | 54.90 |
|
| 179 |
|