|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- Sweaterdog/Smol-reason2.1 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- unsloth/Qwen2.5-3B-Instruct-bnb-4bit |
|
|
--- |
|
|
|
|
|
# 🧠Smol-reason2.1🧠 |
|
|
|
|
|
This is my third GRPO reasoning model, I was exploring fine tuning on my own hardware, and found it to work with 3B models. |
|
|
|
|
|
System prompt: |
|
|
``` |
|
|
You are a reasoning LLM named Smol-reason2.1, developed by Sweaterdog. Respond in the following format: |
|
|
<think> |
|
|
|
|
|
...reason in long recursive loops here... |
|
|
|
|
|
</think> |
|
|
|
|
|
...answer here... |
|
|
|
|
|
Start your response with <think> |
|
|
|
|
|
``` |
|
|
|
|
|
And in accordance to the output format, the model responds like this: |
|
|
``` |
|
|
<think> |
|
|
|
|
|
Okay, lets break down the users issue. |
|
|
|
|
|
...more reasoning... |
|
|
|
|
|
Therefore x should be the answer |
|
|
</think> |
|
|
|
|
|
X is the answer because... |
|
|
``` |
|
|
|
|
|
# Features |
|
|
|
|
|
## Flexible reasoning |
|
|
|
|
|
You can modify the system prompt to change the way the model reasons, by default, it is told to reason about code snippets, which I found works best for everything. |
|
|
|
|
|
## Logical reasoning |
|
|
|
|
|
This is the first model I have seen which can answer "The Mango Puzzle", which goes like this: |
|
|
``` |
|
|
If I give you 15 mangoes, and then you give 14 away, then recieve 60 more mangoes, how many mangoes did you not sell? |
|
|
``` |
|
|
|
|
|
The correct answer is `75 Mangoes`, most LLMs take "Give Away" as a form of sale, so they typically say `61 Mangoes` |
|
|
|
|
|
## Code reasoning |
|
|
|
|
|
This model is capable of thinking about how to design complex code problems before tackling the entire file. |
|
|
|
|
|
## Mathematical reasoning |
|
|
|
|
|
This model is capable of breaking down math equations, and checking its own work before responding with an answer. |
|
|
|
|
|
## Medical reasoning |
|
|
|
|
|
This model is capable of taking in symptoms of a disease, as well as the patients condition, and properly prescribing a diagnosis. |
|
|
|
|
|
# Design |
|
|
|
|
|
This model was trained off of Qwen2.5 3B and trained on a dataset I put together comprised of Coding, Healthcare, and Math |
|
|
|
|
|
To be specific, this model was trained off of Smol-reason2, for longer and on a larger dataset of reasoning data from DeepSeek-R1 |
|
|
|
|
|
This model has RoPE scaling up to `65536`, and the Q8_0 model can fit on a single GPU with the full context length. |