Q-SS-0.5B-Reasoning-Math
A compact, fast, and structured mathematical reasoning model — built to think before it answers.
Q-SS-0.5B-Reasoning-Math is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct, trained using Group Relative Policy Optimization (GRPO) reinforcement learning — the same technique behind DeepSeek-R1. The model is designed to reason explicitly and transparently through mathematical problems before producing a clean, parseable final answer.
💾 Looking for the lightweight CPU version? See Q-SS-0.5B-Reasoning-Math-GGUF for the Q4_K_M quantized model (~300MB).
✨ Highlights
- 🧠 Thinks out loud — explicit step-by-step reasoning inside
<thought>tags before every answer - 🎯 Clean structured output — final answer always isolated in
<answer>tags, trivial to parse - 🔁 RL-trained — learned through reward signals, not just imitation
- 🔧 Fine-tunable — full FP16 weights, ready for further training or fine-tuning
- 🔓 Apache 2.0 — free for personal and commercial use
📋 Model Details
| Property | Details |
|---|---|
| Model Name | Q-SS-0.5B-Reasoning-Math |
| Base Model | Qwen/Qwen2.5-0.5B-Instruct |
| Parameters | 500M |
| Training Method | SFT Warm-up + GRPO Reinforcement Learning |
| Trained On | GSM8K + OpenR1-Math-220k |
| Precision | FP16 (merged, no adapter needed) |
| License | Apache 2.0 |
| Developer | Saad Salman |
💬 Output Format
Every response follows this strict structure:
<thought>
[Step-by-step reasoning and calculations]
</thought>
<answer>
[Final numerical answer only]
</answer>
🚀 Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "saadxsalman/Q-SS-0.5B-Reasoning-Math"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype = torch.float16,
device_map = "auto",
)
SYSTEM_PROMPT = \"\"\"You are a mathematical reasoning engine.
Solve the problem step-by-step inside <thought> tags, then give ONLY the
final numerical or LaTeX result inside <answer> tags.
<thought>
[Your internal reasoning and calculations here]
</thought>
<answer>
[Final answer only]
</answer>\"\"\"
def solve(problem):
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": problem},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True,
return_tensors = "pt",
).to(model.device)
with torch.no_grad():
outputs = model.generate(
input_ids = inputs,
max_new_tokens = 384,
temperature = 0.1,
do_sample = True,
pad_token_id = tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
if "<answer>" in response:
return response.split("<answer>")[-1].split("</answer>")[0].strip()
return response
print(solve("Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?"))
# Output: 42
📝 Example Outputs
Problem: Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?
<thought>
Each cat eats 2 cans per day.
Janet has 3 cats, so they eat 3 × 2 = 6 cans per day together.
For 7 days: 6 × 7 = 42 cans total.
</thought>
<answer>
42
</answer>
Problem: Tom has $50. He buys a book for $12 and a pen for $3. How much money does he have left?
<thought>
Tom starts with $50.
He spends $12 on a book and $3 on a pen.
Total spent: 12 + 3 = $15.
Money remaining: 50 - 15 = $35.
</thought>
<answer>
35
</answer>
✅ What It's Good At
| Problem Type | Support |
|---|---|
| Basic arithmetic | ✅ Reliable |
| Multi-step word problems | ✅ Reliable |
| Problems with units and currency | ✅ Reliable |
| Basic algebra | ⚠️ Partial |
| Competition math (AMC/AIME) | ❌ Beyond capacity |
📦 Related Models
| Repo | Format | Size | Best For |
|---|---|---|---|
| Q-SS-0.5B-Reasoning-Math | FP16 | ~988MB | GPU inference & further fine-tuning |
| Q-SS-0.5B-Reasoning-Math-GGUF | Q4_K_M | ~300MB | Local CPU inference |
⚠️ Limitations
- Optimized for English language math problems only
- Complex abstract reasoning, geometry, and calculus are beyond reliable capacity at 0.5B scale
- Always verify critical calculations — the model may occasionally produce confident but incorrect answers
🙏 Acknowledgements
- Unsloth — efficient fine-tuning framework
- Qwen Team — Qwen2.5-0.5B-Instruct base model
- HuggingFace TRL — GRPO implementation
- OpenR1 — OpenR1-Math-220k dataset
- OpenAI — GSM8K dataset
📄 Citation
@misc{qss-reasoning-math-2025,
author = {Saad Salman},
title = {Q-SS-0.5B-Reasoning-Math},
year = {2025},
publisher = {HuggingFace},
howpublished = {\\url{https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math}},
}
- Downloads last month
- -