--- language: - en license: apache-2.0 base_model: Qwen/Qwen2.5-0.5B-Instruct tags: - qwen2.5 - math - reasoning - grpo - reinforcement-learning - unsloth - gsm8k - structured-output datasets: - openai/gsm8k - open-r1/OpenR1-Math-220k pipeline_tag: text-generation library_name: transformers --- # Q-SS-0.5B-Reasoning-Math > *A compact, fast, and structured mathematical reasoning model — built to think before it answers.* **Q-SS-0.5B-Reasoning-Math** is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), trained using **Group Relative Policy Optimization (GRPO)** reinforcement learning — the same technique behind DeepSeek-R1. The model is designed to reason explicitly and transparently through mathematical problems before producing a clean, parseable final answer. > 💾 Looking for the lightweight CPU version? See [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) for the Q4_K_M quantized model (~300MB). --- ## ✨ Highlights - 🧠 **Thinks out loud** — explicit step-by-step reasoning inside `` tags before every answer - 🎯 **Clean structured output** — final answer always isolated in `` tags, trivial to parse - 🔁 **RL-trained** — learned through reward signals, not just imitation - 🔧 **Fine-tunable** — full FP16 weights, ready for further training or fine-tuning - 🔓 **Apache 2.0** — free for personal and commercial use --- ## 📋 Model Details | Property | Details | |---|---| | **Model Name** | Q-SS-0.5B-Reasoning-Math | | **Base Model** | Qwen/Qwen2.5-0.5B-Instruct | | **Parameters** | 500M | | **Training Method** | SFT Warm-up + GRPO Reinforcement Learning | | **Trained On** | GSM8K + OpenR1-Math-220k | | **Precision** | FP16 (merged, no adapter needed) | | **License** | Apache 2.0 | | **Developer** | Saad Salman | --- ## 💬 Output Format Every response follows this strict structure: ``` [Step-by-step reasoning and calculations] [Final numerical answer only] ``` --- ## 🚀 Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "saadxsalman/Q-SS-0.5B-Reasoning-Math" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype = torch.float16, device_map = "auto", ) SYSTEM_PROMPT = \"\"\"You are a mathematical reasoning engine. Solve the problem step-by-step inside tags, then give ONLY the final numerical or LaTeX result inside tags. [Your internal reasoning and calculations here] [Final answer only] \"\"\" def solve(problem): messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": problem}, ] inputs = tokenizer.apply_chat_template( messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt", ).to(model.device) with torch.no_grad(): outputs = model.generate( input_ids = inputs, max_new_tokens = 384, temperature = 0.1, do_sample = True, pad_token_id = tokenizer.eos_token_id, ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) if "" in response: return response.split("")[-1].split("")[0].strip() return response print(solve("Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?")) # Output: 42 ``` --- ## 📝 Example Outputs **Problem:** Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days? ``` Each cat eats 2 cans per day. Janet has 3 cats, so they eat 3 × 2 = 6 cans per day together. For 7 days: 6 × 7 = 42 cans total. 42 ``` **Problem:** Tom has $50. He buys a book for $12 and a pen for $3. How much money does he have left? ``` Tom starts with $50. He spends $12 on a book and $3 on a pen. Total spent: 12 + 3 = $15. Money remaining: 50 - 15 = $35. 35 ``` --- ## ✅ What It's Good At | Problem Type | Support | |---|---| | Basic arithmetic | ✅ Reliable | | Multi-step word problems | ✅ Reliable | | Problems with units and currency | ✅ Reliable | | Basic algebra | ⚠️ Partial | | Competition math (AMC/AIME) | ❌ Beyond capacity | --- ## 📦 Related Models | Repo | Format | Size | Best For | |---|---|---|---| | [Q-SS-0.5B-Reasoning-Math](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math) | FP16 | ~988MB | GPU inference & further fine-tuning | | [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) | Q4_K_M | ~300MB | Local CPU inference | --- ## ⚠️ Limitations - Optimized for English language math problems only - Complex abstract reasoning, geometry, and calculus are beyond reliable capacity at 0.5B scale - Always verify critical calculations — the model may occasionally produce confident but incorrect answers --- ## 🙏 Acknowledgements - [Unsloth](https://github.com/unslothai/unsloth) — efficient fine-tuning framework - [Qwen Team](https://huggingface.co/Qwen) — Qwen2.5-0.5B-Instruct base model - [HuggingFace TRL](https://github.com/huggingface/trl) — GRPO implementation - [OpenR1](https://huggingface.co/open-r1) — OpenR1-Math-220k dataset - [OpenAI](https://huggingface.co/openai) — GSM8K dataset --- ## 📄 Citation ```bibtex @misc{qss-reasoning-math-2025, author = {Saad Salman}, title = {Q-SS-0.5B-Reasoning-Math}, year = {2025}, publisher = {HuggingFace}, howpublished = {\\url{https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math}}, } ```