Update README.md
Browse files
README.md
CHANGED
|
@@ -123,6 +123,8 @@ Thanks [Argilla](https://huggingface.co/argilla) for providing the dataset and t
|
|
| 123 |
|
| 124 |
## 🏆 Evaluation
|
| 125 |
|
|
|
|
|
|
|
| 126 |
The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite.
|
| 127 |
|
| 128 |
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|
|
@@ -136,6 +138,20 @@ The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/ll
|
|
| 136 |
|
| 137 |
You can find the complete benchmark on [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
|
| 138 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
## 💻 Usage
|
| 140 |
|
| 141 |
```python
|
|
@@ -166,16 +182,3 @@ print(outputs[0]["generated_text"])
|
|
| 166 |
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
|
| 167 |
</a>
|
| 168 |
</p>
|
| 169 |
-
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 170 |
-
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mlabonne__NeuralDaredevil-7B)
|
| 171 |
-
|
| 172 |
-
| Metric |Value|
|
| 173 |
-
|---------------------------------|----:|
|
| 174 |
-
|Avg. |74.12|
|
| 175 |
-
|AI2 Reasoning Challenge (25-Shot)|69.88|
|
| 176 |
-
|HellaSwag (10-Shot) |87.62|
|
| 177 |
-
|MMLU (5-Shot) |65.12|
|
| 178 |
-
|TruthfulQA (0-shot) |66.85|
|
| 179 |
-
|Winogrande (5-shot) |82.08|
|
| 180 |
-
|GSM8k (5-shot) |73.16|
|
| 181 |
-
|
|
|
|
| 123 |
|
| 124 |
## 🏆 Evaluation
|
| 125 |
|
| 126 |
+
### Nous
|
| 127 |
+
|
| 128 |
The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite.
|
| 129 |
|
| 130 |
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|
|
|
|
| 138 |
|
| 139 |
You can find the complete benchmark on [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
|
| 140 |
|
| 141 |
+
# [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 142 |
+
|
| 143 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mlabonne__NeuralDaredevil-7B)
|
| 144 |
+
|
| 145 |
+
| Metric |Value|
|
| 146 |
+
|---------------------------------|----:|
|
| 147 |
+
|Avg. |74.12|
|
| 148 |
+
|AI2 Reasoning Challenge (25-Shot)|69.88|
|
| 149 |
+
|HellaSwag (10-Shot) |87.62|
|
| 150 |
+
|MMLU (5-Shot) |65.12|
|
| 151 |
+
|TruthfulQA (0-shot) |66.85|
|
| 152 |
+
|Winogrande (5-shot) |82.08|
|
| 153 |
+
|GSM8k (5-shot) |73.16|
|
| 154 |
+
|
| 155 |
## 💻 Usage
|
| 156 |
|
| 157 |
```python
|
|
|
|
| 182 |
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
|
| 183 |
</a>
|
| 184 |
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|