Update README.md
Browse files
README.md
CHANGED
|
@@ -150,7 +150,7 @@ evalplus.evaluate \
|
|
| 150 |
|
| 151 |
#### OpenLLM Leaderboard V1 evaluation scores
|
| 152 |
|
| 153 |
-
| Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-
|
| 154 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 155 |
| ARC-Challenge (Acc-Norm, 25-shot) | 66.81 | 66.81 |
|
| 156 |
| GSM8K (Strict-Match, 5-shot) | 64.52 | 66.64 |
|
|
@@ -161,8 +161,19 @@ evalplus.evaluate \
|
|
| 161 |
| **Average Score** | **70.30** | **70.57** |
|
| 162 |
| **Recovery** | **100.00** | **100.39** |
|
| 163 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
#### HumanEval pass@1 scores
|
| 165 |
-
| Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-
|
| 166 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 167 |
| HumanEval Pass@1 | 71.00 | 69.90 |
|
| 168 |
|
|
|
|
| 150 |
|
| 151 |
#### OpenLLM Leaderboard V1 evaluation scores
|
| 152 |
|
| 153 |
+
| Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-FP8-dynamic |
|
| 154 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 155 |
| ARC-Challenge (Acc-Norm, 25-shot) | 66.81 | 66.81 |
|
| 156 |
| GSM8K (Strict-Match, 5-shot) | 64.52 | 66.64 |
|
|
|
|
| 161 |
| **Average Score** | **70.30** | **70.57** |
|
| 162 |
| **Recovery** | **100.00** | **100.39** |
|
| 163 |
|
| 164 |
+
| Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-FP8-dynamic |
|
| 165 |
+
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 166 |
+
| IFEval (Inst Level Strict Acc, 0-shot)| 74.10 | 73.62 |
|
| 167 |
+
| BBH (Acc-Norm, 3-shot) | 53.19 | 53.26 |
|
| 168 |
+
| Math-Hard (Exact-Match, 4-shot) | 14.77 | 16.79 |
|
| 169 |
+
| GPQA (Acc-Norm, 0-shot) | 31.76 | 32.58 |
|
| 170 |
+
| MUSR (Acc-Norm, 0-shot) | 46.01 | 47.34 |
|
| 171 |
+
| MMLU-Pro (Acc, 5-shot) | 35.81 | 35.72 |
|
| 172 |
+
| **Average Score** | **42.61** | **43.22** |
|
| 173 |
+
| **Recovery** | **100.00** | **101.43** |
|
| 174 |
+
|
| 175 |
#### HumanEval pass@1 scores
|
| 176 |
+
| Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-FP8-dynamic |
|
| 177 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
| 178 |
| HumanEval Pass@1 | 71.00 | 69.90 |
|
| 179 |
|