clean up the evals
Browse files
README.md
CHANGED
|
@@ -147,6 +147,20 @@ GGUF (2/3/4/5/6/8 bits): [MaziyarPanahi/phi-2-logical-sft-GGUF](https://huggingf
|
|
| 147 |
### Response:
|
| 148 |
```
|
| 149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
## Examples
|
| 151 |
|
| 152 |
```
|
|
@@ -222,19 +236,6 @@ Now, let's eliminate the first possibility, because it contradicts the premise t
|
|
| 222 |
---
|
| 223 |
|
| 224 |
|
| 225 |
-
|
| 226 |
-
## Model description
|
| 227 |
-
|
| 228 |
-
More information needed
|
| 229 |
-
|
| 230 |
-
## Intended uses & limitations
|
| 231 |
-
|
| 232 |
-
More information needed
|
| 233 |
-
|
| 234 |
-
## Training and evaluation data
|
| 235 |
-
|
| 236 |
-
More information needed
|
| 237 |
-
|
| 238 |
## Training procedure
|
| 239 |
|
| 240 |
### Training hyperparameters
|
|
@@ -359,17 +360,6 @@ special_tokens:
|
|
| 359 |
pad_token: "<|endoftext|>"
|
| 360 |
```
|
| 361 |
|
| 362 |
-
</details
|
| 363 |
-
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 364 |
-
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__phi-2-logical-sft)
|
| 365 |
|
| 366 |
-
| Metric |Value|
|
| 367 |
-
|---------------------------------|----:|
|
| 368 |
-
|Avg. |61.50|
|
| 369 |
-
|AI2 Reasoning Challenge (25-Shot)|61.35|
|
| 370 |
-
|HellaSwag (10-Shot) |75.14|
|
| 371 |
-
|MMLU (5-Shot) |57.40|
|
| 372 |
-
|TruthfulQA (0-shot) |44.39|
|
| 373 |
-
|Winogrande (5-shot) |74.90|
|
| 374 |
-
|GSM8k (5-shot) |55.80|
|
| 375 |
|
|
|
|
| 147 |
### Response:
|
| 148 |
```
|
| 149 |
|
| 150 |
+
## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 151 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__phi-2-logical-sft)
|
| 152 |
+
|
| 153 |
+
| Metric |Value|
|
| 154 |
+
|---------------------------------|----:|
|
| 155 |
+
|Avg. |61.50|
|
| 156 |
+
|AI2 Reasoning Challenge (25-Shot)|61.35|
|
| 157 |
+
|HellaSwag (10-Shot) |75.14|
|
| 158 |
+
|MMLU (5-Shot) |57.40|
|
| 159 |
+
|TruthfulQA (0-shot) |44.39|
|
| 160 |
+
|Winogrande (5-shot) |74.90|
|
| 161 |
+
|GSM8k (5-shot) |55.80|
|
| 162 |
+
|
| 163 |
+
|
| 164 |
## Examples
|
| 165 |
|
| 166 |
```
|
|
|
|
| 236 |
---
|
| 237 |
|
| 238 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
## Training procedure
|
| 240 |
|
| 241 |
### Training hyperparameters
|
|
|
|
| 360 |
pad_token: "<|endoftext|>"
|
| 361 |
```
|
| 362 |
|
| 363 |
+
</details>
|
|
|
|
|
|
|
| 364 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 365 |
|