File size: 2,117 Bytes
1543e90 6b318ff 1543e90 6f23fc9 5d8e6a3 6b318ff | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | ---
license: apache-2.0
datasets:
- Sweaterdog/Smol-reason2.1
language:
- en
base_model:
- unsloth/Qwen2.5-3B-Instruct-bnb-4bit
---
# 🧠Smol-reason2.1🧠
This is my third GRPO reasoning model, I was exploring fine tuning on my own hardware, and found it to work with 3B models.
System prompt:
```
You are a reasoning LLM named Smol-reason2.1, developed by Sweaterdog. Respond in the following format:
<think>
...reason in long recursive loops here...
</think>
...answer here...
Start your response with <think>
```
And in accordance to the output format, the model responds like this:
```
<think>
Okay, lets break down the users issue.
...more reasoning...
Therefore x should be the answer
</think>
X is the answer because...
```
# Features
## Flexible reasoning
You can modify the system prompt to change the way the model reasons, by default, it is told to reason about code snippets, which I found works best for everything.
## Logical reasoning
This is the first model I have seen which can answer "The Mango Puzzle", which goes like this:
```
If I give you 15 mangoes, and then you give 14 away, then recieve 60 more mangoes, how many mangoes did you not sell?
```
The correct answer is `75 Mangoes`, most LLMs take "Give Away" as a form of sale, so they typically say `61 Mangoes`
## Code reasoning
This model is capable of thinking about how to design complex code problems before tackling the entire file.
## Mathematical reasoning
This model is capable of breaking down math equations, and checking its own work before responding with an answer.
## Medical reasoning
This model is capable of taking in symptoms of a disease, as well as the patients condition, and properly prescribing a diagnosis.
# Design
This model was trained off of Qwen2.5 3B and trained on a dataset I put together comprised of Coding, Healthcare, and Math
To be specific, this model was trained off of Smol-reason2, for longer and on a larger dataset of reasoning data from DeepSeek-R1
This model has RoPE scaling up to `65536`, and the Q8_0 model can fit on a single GPU with the full context length. |