| --- |
| library_name: transformers |
| tags: |
| - trl |
| - grpo |
| - rl |
| - superthoughts |
| - reasoning |
| - cot |
| license: apache-2.0 |
| datasets: |
| - openai/gsm8k |
| - Pinkstack/intructions-sft-sharegpt |
| language: |
| - en |
| base_model: |
| - HuggingFaceTB/SmolLM2-1.7B-Instruct |
| pipeline_tag: text-generation |
| --- |
| |
|  |
|
|
| # Information |
| Advanced, high-quality and **lite** reasoning for a tiny size that you can run on your phone. |
| At original quality, it runs at ~400 tokens/second on a single H100 Nvidia GPU from Friendli. |
|
|
| Trained similarly to Deepseek R1, we used Smollm2 as a base model, then we've SFT fine tuned on reasoning using our own private superthoughts instruct dataset & modified the tokenizer slightly, after the SFT fine tuning we used GRPO to further amplify it's mathematics & problem solving abilities. |
|
|
| # Format |
| ``` |
| <|im_start|>user |
| How many R's in strawberry<|im_end|> |
| <|im_start|>assistant |
| <think> |
| Alright, the user has asked how many R's in the word strawberry, that's easy! I just need to count each instance of the letter 'R' in the word 's-t-r-a-w-b-e-r-r-y' and then find out how many R's there are, lets count! |
| S - Not an R, |
| T - Not an R, |
| R - First instance of the letter R! (1), |
| A - Not an R, |
| W - Not an R, |
| B - Not an R, |
| E - Not an R, |
| R - Great! Second instance of the letter R. (2), |
| R - Third instance of the letter R. (3), |
| Y - Not an R. |
| |
| So, i've counted all the letters correctly, meaning that I am sure that there are 3 R's in the word Strawberry. I should probably let the user know. |
| </think> |
| <output>3 |
| </output><|im_end|> |
| ``` |
| # system prompt |
| (important to ensure it would always think, output). |
| ``` |
| respond in the following format: |
| <think> |
| ... |
| </think> |
| <output> |
| ... |
| </output> |
| ``` |
| # Examples: |
| all responses below generated with our system prompt and a temperature of 0.7. |
| Generated inside the android application, ChatterUI via GGUF Q8, using the model's prompt format. and our |
| 1) |
|  |
| 2) |
|  |
| 3) |
|  |
|
|
| # Uploaded model |
|
|
| - **Developed by:** Pinkstack |
| - **License:** apache-2.0 |
| - **Finetuned from model :** HuggingFaceTB/SmolLM2-1.7B-Instruct |