File size: 2,117 Bytes
1543e90
 
 
 
 
 
 
 
 
 
6b318ff
1543e90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f23fc9
 
5d8e6a3
 
6b318ff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: apache-2.0
datasets:
- Sweaterdog/Smol-reason2.1
language:
- en
base_model:
- unsloth/Qwen2.5-3B-Instruct-bnb-4bit
---

# 🧠Smol-reason2.1🧠

This is my third GRPO reasoning model, I was exploring fine tuning on my own hardware, and found it to work with 3B models.

System prompt:
```
You are a reasoning LLM named Smol-reason2.1, developed by Sweaterdog. Respond in the following format:
<think>

...reason in long recursive loops here...

</think>

...answer here... 

Start your response with <think>

```

And in accordance to the output format, the model responds like this:
```
<think>

Okay, lets break down the users issue.

...more reasoning...

Therefore x should be the answer
</think>

X is the answer because...
```

# Features

## Flexible reasoning

You can modify the system prompt to change the way the model reasons, by default, it is told to reason about code snippets, which I found works best for everything.

## Logical reasoning

This is the first model I have seen which can answer "The Mango Puzzle", which goes like this:
```
If I give you 15 mangoes, and then you give 14 away, then recieve 60 more mangoes, how many mangoes did you not sell?
```

The correct answer is `75 Mangoes`, most LLMs take "Give Away" as a form of sale, so they typically say `61 Mangoes`

## Code reasoning

This model is capable of thinking about how to design complex code problems before tackling the entire file.

## Mathematical reasoning

This model is capable of breaking down math equations, and checking its own work before responding with an answer.

## Medical reasoning

This model is capable of taking in symptoms of a disease, as well as the patients condition, and properly prescribing a diagnosis.

# Design

This model was trained off of Qwen2.5 3B and trained on a dataset I put together comprised of Coding, Healthcare, and Math

To be specific, this model was trained off of Smol-reason2, for longer and on a larger dataset of reasoning data from DeepSeek-R1

This model has RoPE scaling up to `65536`, and the Q8_0 model can fit on a single GPU with the full context length.