File size: 5,953 Bytes
74f06ef
be1ac28
6ca001e
74f06ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ca001e
 
74f06ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ca001e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213

---
language:
  - en
license: apache-2.0
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
  - qwen2.5
  - math
  - reasoning
  - grpo
  - reinforcement-learning
  - unsloth
  - gsm8k
  - structured-output
datasets:
  - openai/gsm8k
  - open-r1/OpenR1-Math-220k
pipeline_tag: text-generation
library_name: transformers
---

# Q-SS-0.5B-Reasoning-Math

> *A compact, fast, and structured mathematical reasoning model β€” built to think before it answers.*

**Q-SS-0.5B-Reasoning-Math** is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), trained using **Group Relative Policy Optimization (GRPO)** reinforcement learning β€” the same technique behind DeepSeek-R1. The model is designed to reason explicitly and transparently through mathematical problems before producing a clean, parseable final answer.

> πŸ’Ύ Looking for the lightweight CPU version? See [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) for the Q4_K_M quantized model (~300MB).

---

## ✨ Highlights

- 🧠 **Thinks out loud** β€” explicit step-by-step reasoning inside `<thought>` tags before every answer
- 🎯 **Clean structured output** β€” final answer always isolated in `<answer>` tags, trivial to parse
- πŸ” **RL-trained** β€” learned through reward signals, not just imitation
- πŸ”§ **Fine-tunable** β€” full FP16 weights, ready for further training or fine-tuning
- πŸ”“ **Apache 2.0** β€” free for personal and commercial use

---

## πŸ“‹ Model Details

| Property | Details |
|---|---|
| **Model Name** | Q-SS-0.5B-Reasoning-Math |
| **Base Model** | Qwen/Qwen2.5-0.5B-Instruct |
| **Parameters** | 500M |
| **Training Method** | SFT Warm-up + GRPO Reinforcement Learning |
| **Trained On** | GSM8K + OpenR1-Math-220k |
| **Precision** | FP16 (merged, no adapter needed) |
| **License** | Apache 2.0 |
| **Developer** | Saad Salman |

---

## πŸ’¬ Output Format

Every response follows this strict structure:

```
<thought>
[Step-by-step reasoning and calculations]
</thought>
<answer>
[Final numerical answer only]
</answer>
```

---

## πŸš€ Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "saadxsalman/Q-SS-0.5B-Reasoning-Math"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype = torch.float16,
    device_map  = "auto",
)

SYSTEM_PROMPT = \"\"\"You are a mathematical reasoning engine.
Solve the problem step-by-step inside <thought> tags, then give ONLY the
final numerical or LaTeX result inside <answer> tags.

<thought>
[Your internal reasoning and calculations here]
</thought>
<answer>
[Final answer only]
</answer>\"\"\"

def solve(problem):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user",   "content": problem},
    ]
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize              = True,
        add_generation_prompt = True,
        return_tensors        = "pt",
    ).to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            input_ids      = inputs,
            max_new_tokens = 384,
            temperature    = 0.1,
            do_sample      = True,
            pad_token_id   = tokenizer.eos_token_id,
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    if "<answer>" in response:
        return response.split("<answer>")[-1].split("</answer>")[0].strip()
    return response

print(solve("Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?"))
# Output: 42
```

---

## πŸ“ Example Outputs

**Problem:** Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?

```
<thought>
Each cat eats 2 cans per day.
Janet has 3 cats, so they eat 3 Γ— 2 = 6 cans per day together.
For 7 days: 6 Γ— 7 = 42 cans total.
</thought>
<answer>
42
</answer>
```

**Problem:** Tom has $50. He buys a book for $12 and a pen for $3. How much money does he have left?

```
<thought>
Tom starts with $50.
He spends $12 on a book and $3 on a pen.
Total spent: 12 + 3 = $15.
Money remaining: 50 - 15 = $35.
</thought>
<answer>
35
</answer>
```

---

## βœ… What It's Good At

| Problem Type | Support |
|---|---|
| Basic arithmetic | βœ… Reliable |
| Multi-step word problems | βœ… Reliable |
| Problems with units and currency | βœ… Reliable |
| Basic algebra | ⚠️ Partial |
| Competition math (AMC/AIME) | ❌ Beyond capacity |

---

## πŸ“¦ Related Models

| Repo | Format | Size | Best For |
|---|---|---|---|
| [Q-SS-0.5B-Reasoning-Math](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math) | FP16 | ~988MB | GPU inference & further fine-tuning |
| [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) | Q4_K_M | ~300MB | Local CPU inference |

---

## ⚠️ Limitations

- Optimized for English language math problems only
- Complex abstract reasoning, geometry, and calculus are beyond reliable capacity at 0.5B scale
- Always verify critical calculations β€” the model may occasionally produce confident but incorrect answers

---

## πŸ™ Acknowledgements

- [Unsloth](https://github.com/unslothai/unsloth) β€” efficient fine-tuning framework
- [Qwen Team](https://huggingface.co/Qwen) β€” Qwen2.5-0.5B-Instruct base model
- [HuggingFace TRL](https://github.com/huggingface/trl) β€” GRPO implementation
- [OpenR1](https://huggingface.co/open-r1) β€” OpenR1-Math-220k dataset
- [OpenAI](https://huggingface.co/openai) β€” GSM8K dataset

---

## πŸ“„ Citation

```bibtex
@misc{qss-reasoning-math-2025,
  author       = {Saad Salman},
  title        = {Q-SS-0.5B-Reasoning-Math},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\\url{https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math}},
}
```