Text Generation
Transformers
Safetensors
English
qwen2
qwen2.5
math
reasoning
grpo
reinforcement-learning
unsloth
gsm8k
structured-output
conversational
text-generation-inference
Instructions to use saadxsalman/Q-SS-0.5B-Reasoning-Math with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="saadxsalman/Q-SS-0.5B-Reasoning-Math") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("saadxsalman/Q-SS-0.5B-Reasoning-Math") model = AutoModelForCausalLM.from_pretrained("saadxsalman/Q-SS-0.5B-Reasoning-Math") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "saadxsalman/Q-SS-0.5B-Reasoning-Math" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saadxsalman/Q-SS-0.5B-Reasoning-Math", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/saadxsalman/Q-SS-0.5B-Reasoning-Math
- SGLang
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "saadxsalman/Q-SS-0.5B-Reasoning-Math" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saadxsalman/Q-SS-0.5B-Reasoning-Math", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "saadxsalman/Q-SS-0.5B-Reasoning-Math" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saadxsalman/Q-SS-0.5B-Reasoning-Math", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for saadxsalman/Q-SS-0.5B-Reasoning-Math to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for saadxsalman/Q-SS-0.5B-Reasoning-Math to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for saadxsalman/Q-SS-0.5B-Reasoning-Math to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="saadxsalman/Q-SS-0.5B-Reasoning-Math", max_seq_length=2048, ) - Docker Model Runner
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with Docker Model Runner:
docker model run hf.co/saadxsalman/Q-SS-0.5B-Reasoning-Math
| language: | |
| - en | |
| license: apache-2.0 | |
| base_model: Qwen/Qwen2.5-0.5B-Instruct | |
| tags: | |
| - qwen2.5 | |
| - math | |
| - reasoning | |
| - grpo | |
| - reinforcement-learning | |
| - unsloth | |
| - gsm8k | |
| - structured-output | |
| datasets: | |
| - openai/gsm8k | |
| - open-r1/OpenR1-Math-220k | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| # Q-SS-0.5B-Reasoning-Math | |
| > *A compact, fast, and structured mathematical reasoning model β built to think before it answers.* | |
| **Q-SS-0.5B-Reasoning-Math** is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), trained using **Group Relative Policy Optimization (GRPO)** reinforcement learning β the same technique behind DeepSeek-R1. The model is designed to reason explicitly and transparently through mathematical problems before producing a clean, parseable final answer. | |
| > πΎ Looking for the lightweight CPU version? See [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) for the Q4_K_M quantized model (~300MB). | |
| --- | |
| ## β¨ Highlights | |
| - π§ **Thinks out loud** β explicit step-by-step reasoning inside `<thought>` tags before every answer | |
| - π― **Clean structured output** β final answer always isolated in `<answer>` tags, trivial to parse | |
| - π **RL-trained** β learned through reward signals, not just imitation | |
| - π§ **Fine-tunable** β full FP16 weights, ready for further training or fine-tuning | |
| - π **Apache 2.0** β free for personal and commercial use | |
| --- | |
| ## π Model Details | |
| | Property | Details | | |
| |---|---| | |
| | **Model Name** | Q-SS-0.5B-Reasoning-Math | | |
| | **Base Model** | Qwen/Qwen2.5-0.5B-Instruct | | |
| | **Parameters** | 500M | | |
| | **Training Method** | SFT Warm-up + GRPO Reinforcement Learning | | |
| | **Trained On** | GSM8K + OpenR1-Math-220k | | |
| | **Precision** | FP16 (merged, no adapter needed) | | |
| | **License** | Apache 2.0 | | |
| | **Developer** | Saad Salman | | |
| --- | |
| ## π¬ Output Format | |
| Every response follows this strict structure: | |
| ``` | |
| <thought> | |
| [Step-by-step reasoning and calculations] | |
| </thought> | |
| <answer> | |
| [Final numerical answer only] | |
| </answer> | |
| ``` | |
| --- | |
| ## π Quick Start | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| model_name = "saadxsalman/Q-SS-0.5B-Reasoning-Math" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype = torch.float16, | |
| device_map = "auto", | |
| ) | |
| SYSTEM_PROMPT = \"\"\"You are a mathematical reasoning engine. | |
| Solve the problem step-by-step inside <thought> tags, then give ONLY the | |
| final numerical or LaTeX result inside <answer> tags. | |
| <thought> | |
| [Your internal reasoning and calculations here] | |
| </thought> | |
| <answer> | |
| [Final answer only] | |
| </answer>\"\"\" | |
| def solve(problem): | |
| messages = [ | |
| {"role": "system", "content": SYSTEM_PROMPT}, | |
| {"role": "user", "content": problem}, | |
| ] | |
| inputs = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize = True, | |
| add_generation_prompt = True, | |
| return_tensors = "pt", | |
| ).to(model.device) | |
| with torch.no_grad(): | |
| outputs = model.generate( | |
| input_ids = inputs, | |
| max_new_tokens = 384, | |
| temperature = 0.1, | |
| do_sample = True, | |
| pad_token_id = tokenizer.eos_token_id, | |
| ) | |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| if "<answer>" in response: | |
| return response.split("<answer>")[-1].split("</answer>")[0].strip() | |
| return response | |
| print(solve("Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?")) | |
| # Output: 42 | |
| ``` | |
| --- | |
| ## π Example Outputs | |
| **Problem:** Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days? | |
| ``` | |
| <thought> | |
| Each cat eats 2 cans per day. | |
| Janet has 3 cats, so they eat 3 Γ 2 = 6 cans per day together. | |
| For 7 days: 6 Γ 7 = 42 cans total. | |
| </thought> | |
| <answer> | |
| 42 | |
| </answer> | |
| ``` | |
| **Problem:** Tom has $50. He buys a book for $12 and a pen for $3. How much money does he have left? | |
| ``` | |
| <thought> | |
| Tom starts with $50. | |
| He spends $12 on a book and $3 on a pen. | |
| Total spent: 12 + 3 = $15. | |
| Money remaining: 50 - 15 = $35. | |
| </thought> | |
| <answer> | |
| 35 | |
| </answer> | |
| ``` | |
| --- | |
| ## β What It's Good At | |
| | Problem Type | Support | | |
| |---|---| | |
| | Basic arithmetic | β Reliable | | |
| | Multi-step word problems | β Reliable | | |
| | Problems with units and currency | β Reliable | | |
| | Basic algebra | β οΈ Partial | | |
| | Competition math (AMC/AIME) | β Beyond capacity | | |
| --- | |
| ## π¦ Related Models | |
| | Repo | Format | Size | Best For | | |
| |---|---|---|---| | |
| | [Q-SS-0.5B-Reasoning-Math](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math) | FP16 | ~988MB | GPU inference & further fine-tuning | | |
| | [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) | Q4_K_M | ~300MB | Local CPU inference | | |
| --- | |
| ## β οΈ Limitations | |
| - Optimized for English language math problems only | |
| - Complex abstract reasoning, geometry, and calculus are beyond reliable capacity at 0.5B scale | |
| - Always verify critical calculations β the model may occasionally produce confident but incorrect answers | |
| --- | |
| ## π Acknowledgements | |
| - [Unsloth](https://github.com/unslothai/unsloth) β efficient fine-tuning framework | |
| - [Qwen Team](https://huggingface.co/Qwen) β Qwen2.5-0.5B-Instruct base model | |
| - [HuggingFace TRL](https://github.com/huggingface/trl) β GRPO implementation | |
| - [OpenR1](https://huggingface.co/open-r1) β OpenR1-Math-220k dataset | |
| - [OpenAI](https://huggingface.co/openai) β GSM8K dataset | |
| --- | |
| ## π Citation | |
| ```bibtex | |
| @misc{qss-reasoning-math-2025, | |
| author = {Saad Salman}, | |
| title = {Q-SS-0.5B-Reasoning-Math}, | |
| year = {2025}, | |
| publisher = {HuggingFace}, | |
| howpublished = {\\url{https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math}}, | |
| } | |
| ``` | |