SultanR
/

SmolTulu-1.7b-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions

SultanR commited on Dec 12, 2024

Commit

cef7eef

·

verified ·

1 Parent(s): 59b5dc8

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -126,7 +126,7 @@ This model scores the highest current score in both IFEval and GSM8k while maint
 Something important to note, this model has only undergone SFT and DPO, the RLVR (reinforcement learning with verifiable rewards) stage was too computationally expensive to run properly.
-# Evaluation
 I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison.
@@ -140,7 +140,7 @@ I ran these evaluations using [SmolLM2's evaluation code](https://github.com/hug
 | HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 |
 | MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 |
-# Usage
 Just like any Huggingface model, just run it using the transformers library:
@@ -159,7 +159,7 @@ print(tokenizer.decode(outputs[0]))
 You can also use the model in llama.cpp through the [gguf version](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct-GGUF)!
-# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_SultanR__SmolTulu-1.7b-Instruct)
@@ -177,8 +177,9 @@ As of writing this, the number 1 ranking model in IFEval for any model under 2 b
 |MuSR (0-shot)      | 1.92|
 |MMLU-PRO (5-shot)  | 7.89|
-# Citation
 @misc{alrashed2024smoltuluhigherlearningrate,
       title={SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs},
       author={Sultan Alrashed},
@@ -187,4 +188,5 @@ As of writing this, the number 1 ranking model in IFEval for any model under 2 b
       archivePrefix={arXiv},
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2412.08347},
-}

 Something important to note, this model has only undergone SFT and DPO, the RLVR (reinforcement learning with verifiable rewards) stage was too computationally expensive to run properly.
+## Evaluation
 I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison.
 | HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 |
 | MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 |
+## Usage
 Just like any Huggingface model, just run it using the transformers library:
 You can also use the model in llama.cpp through the [gguf version](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct-GGUF)!
+## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_SultanR__SmolTulu-1.7b-Instruct)
 |MuSR (0-shot)      | 1.92|
 |MMLU-PRO (5-shot)  | 7.89|
+## Citation
+```
 @misc{alrashed2024smoltuluhigherlearningrate,
       title={SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs},
       author={Sultan Alrashed},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2412.08347},
+}
+```