Update README.md
Browse files
README.md
CHANGED
|
@@ -126,7 +126,7 @@ This model scores the highest current score in both IFEval and GSM8k while maint
|
|
| 126 |
|
| 127 |
Something important to note, this model has only undergone SFT and DPO, the RLVR (reinforcement learning with verifiable rewards) stage was too computationally expensive to run properly.
|
| 128 |
|
| 129 |
-
|
| 130 |
|
| 131 |
I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison.
|
| 132 |
|
|
@@ -140,7 +140,7 @@ I ran these evaluations using [SmolLM2's evaluation code](https://github.com/hug
|
|
| 140 |
| HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 |
|
| 141 |
| MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 |
|
| 142 |
|
| 143 |
-
|
| 144 |
|
| 145 |
Just like any Huggingface model, just run it using the transformers library:
|
| 146 |
|
|
@@ -159,7 +159,7 @@ print(tokenizer.decode(outputs[0]))
|
|
| 159 |
|
| 160 |
You can also use the model in llama.cpp through the [gguf version](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct-GGUF)!
|
| 161 |
|
| 162 |
-
|
| 163 |
|
| 164 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_SultanR__SmolTulu-1.7b-Instruct)
|
| 165 |
|
|
@@ -177,8 +177,9 @@ As of writing this, the number 1 ranking model in IFEval for any model under 2 b
|
|
| 177 |
|MuSR (0-shot) | 1.92|
|
| 178 |
|MMLU-PRO (5-shot) | 7.89|
|
| 179 |
|
| 180 |
-
|
| 181 |
|
|
|
|
| 182 |
@misc{alrashed2024smoltuluhigherlearningrate,
|
| 183 |
title={SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs},
|
| 184 |
author={Sultan Alrashed},
|
|
@@ -187,4 +188,5 @@ As of writing this, the number 1 ranking model in IFEval for any model under 2 b
|
|
| 187 |
archivePrefix={arXiv},
|
| 188 |
primaryClass={cs.CL},
|
| 189 |
url={https://arxiv.org/abs/2412.08347},
|
| 190 |
-
}
|
|
|
|
|
|
| 126 |
|
| 127 |
Something important to note, this model has only undergone SFT and DPO, the RLVR (reinforcement learning with verifiable rewards) stage was too computationally expensive to run properly.
|
| 128 |
|
| 129 |
+
## Evaluation
|
| 130 |
|
| 131 |
I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison.
|
| 132 |
|
|
|
|
| 140 |
| HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 |
|
| 141 |
| MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 |
|
| 142 |
|
| 143 |
+
## Usage
|
| 144 |
|
| 145 |
Just like any Huggingface model, just run it using the transformers library:
|
| 146 |
|
|
|
|
| 159 |
|
| 160 |
You can also use the model in llama.cpp through the [gguf version](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct-GGUF)!
|
| 161 |
|
| 162 |
+
## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
| 163 |
|
| 164 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_SultanR__SmolTulu-1.7b-Instruct)
|
| 165 |
|
|
|
|
| 177 |
|MuSR (0-shot) | 1.92|
|
| 178 |
|MMLU-PRO (5-shot) | 7.89|
|
| 179 |
|
| 180 |
+
## Citation
|
| 181 |
|
| 182 |
+
```
|
| 183 |
@misc{alrashed2024smoltuluhigherlearningrate,
|
| 184 |
title={SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs},
|
| 185 |
author={Sultan Alrashed},
|
|
|
|
| 188 |
archivePrefix={arXiv},
|
| 189 |
primaryClass={cs.CL},
|
| 190 |
url={https://arxiv.org/abs/2412.08347},
|
| 191 |
+
}
|
| 192 |
+
```
|