Instructions to use tengfeima-ai/Qwen2.5-0.5B-Math-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tengfeima-ai/Qwen2.5-0.5B-Math-SFT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tengfeima-ai/Qwen2.5-0.5B-Math-SFT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tengfeima-ai/Qwen2.5-0.5B-Math-SFT")
model = AutoModelForCausalLM.from_pretrained("tengfeima-ai/Qwen2.5-0.5B-Math-SFT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use tengfeima-ai/Qwen2.5-0.5B-Math-SFT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tengfeima-ai/Qwen2.5-0.5B-Math-SFT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tengfeima-ai/Qwen2.5-0.5B-Math-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tengfeima-ai/Qwen2.5-0.5B-Math-SFT

SGLang

How to use tengfeima-ai/Qwen2.5-0.5B-Math-SFT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tengfeima-ai/Qwen2.5-0.5B-Math-SFT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tengfeima-ai/Qwen2.5-0.5B-Math-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tengfeima-ai/Qwen2.5-0.5B-Math-SFT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tengfeima-ai/Qwen2.5-0.5B-Math-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tengfeima-ai/Qwen2.5-0.5B-Math-SFT with Docker Model Runner:
```
docker model run hf.co/tengfeima-ai/Qwen2.5-0.5B-Math-SFT
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen2.5-0.5B-Math-SFT

Supervised Fine-Tuned version of Qwen/Qwen2.5-0.5B on 32,774 high-quality mathematical reasoning samples from DeepMath-103K, with DeepSeek-R1-generated chain-of-thought solutions as training targets.

This is Stage B of the AIMS5740 Final Project pipeline on Data Selection + RL for LLMs (Math/STEM). The GRPO-trained successor is at tengfeima-ai/Qwen2.5-0.5B-Math-GRPO.

🔁 3-Stage Training Pipeline

Stage A ─ Data Selection & Filtering
  DeepMath-103K (103,022 raw samples)
      ↓  difficulty ≥ 3/10, length filters, valid answer check
  32,774 curated samples  (33.5% retention)

Stage B ─ Supervised Fine-Tuning  ← THIS MODEL
  Base: Qwen/Qwen2.5-0.5B
      ↓  3 epochs · 2×H100 SXM · DeepSpeed ZeRO-2 · Flash Attn 2
  Qwen2.5-0.5B-Math-SFT

Stage C ─ GRPO Reinforcement Learning
  Qwen2.5-0.5B-Math-SFT
      ↓  reward = correctness + format + length_penalty
  Qwen2.5-0.5B-Math-GRPO

Inspired by DeepSeek-R1: imitate R1 CoT via SFT first, then refine with outcome-based RL rewards.

🏆 Evaluation Results

Benchmark	Base Model	This Model (SFT)	Δ
MATH-500	nan%	nan%	—
GSM8K	nan%	nan%	—
MMLU-STEM	N/A%	N/A%	—

Evaluation conducted with lm-evaluation-harness. Results for GRPO model: see tengfeima-ai/Qwen2.5-0.5B-Math-GRPO.

🗂️ Training Data — DeepMath-103K (Filtered)

Property	Value
Source	zwhe99/DeepMath-103K
Raw samples	103,022
After filtering	32,774 (33.5% retention)
Main rejection cause	R1 solutions > 2048 words (52,778 samples)
Solution type	DeepSeek-R1 chain-of-thought (r1_solution_1/2/3)
Topics	Competition math, algebra, number theory, combinatorics, calculus

Stage A filter criteria:

Difficulty score ≥ 3.0 (DeepMath native score, scale 1–10)
Solution word count: 50 – 2048 words
Non-empty final_answer field
Best of 3 R1 solutions selected by length heuristic

Training format (Alpaca-style):

{
  "instruction": "Solve the following math problem step by step.",
  "input": "<problem statement>",
  "output": "<R1-style CoT reasoning>\n\nThe answer is: <final_answer>"
}

📊 Training Metrics

Metric	Value
Final train loss	0.6287
Final eval loss	0.6340
Total epochs	3
Total optimizer steps	1,521
Training time	40.5 minutes
Throughput	40.1 samples/sec
Total FLOPs	4.28e+17
Final learning rate	~9.5e-11 (cosine decay to ~0)

Loss curve: decreased from ~1.2 (step 1) → ~0.57 (step 1520), indicating good convergence without overfitting (eval loss tracked train loss closely throughout).

⚙️ Training Configuration

Parameter	Value
Base model	`Qwen/Qwen2.5-0.5B`
Fine-tuning method	Full fine-tuning (no LoRA/PEFT)
Framework	LLaMA-Factory v0.9+
Hardware	2× NVIDIA H100 SXM 80GB HBM3 (NVLink 4.0)
Multi-GPU	DeepSpeed ZeRO Stage 2
Precision	bfloat16
Attention	Flash Attention 2
Per-device batch size	4
Gradient accumulation steps	8
Effective global batch size	64 (4 × 8 × 2 GPUs)
Optimizer	AdamW (β₁=0.9, β₂=0.999)
Learning rate	1e-5
LR scheduler	Cosine with warmup
Warmup ratio	0.03
Weight decay	0.01
Max gradient norm	1.0
Max sequence length	2048 tokens
Gradient checkpointing	Enabled (saves ~30% VRAM)
Peak GPU memory	~26 GB / 80 GB per H100
Training date	2025-03-28

💬 Prompt Format

This model uses the Qwen chat template. For best results:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("tengfeima-ai/Qwen2.5-0.5B-Math-SFT")
tokenizer = AutoTokenizer.from_pretrained("tengfeima-ai/Qwen2.5-0.5B-Math-SFT")

messages = [
    {"role": "system", "content": "You are a math expert. Think step by step and end with the final answer in \\boxed{}."},
    {"role": "user", "content": "Solve: What is the sum of all integers from 1 to 100?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.0)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

📚 Related Work

MetaMath — Math data augmentation
GRPO (TRL) — RL post-training
DeepSeek-R1 — Inspiration for training pipeline

📄 Citation

@misc{tengfeima2026qwen25mathsft,
  title     = {Qwen2.5-0.5B-Math-SFT: Supervised Fine-Tuning for Math Reasoning},
  author    = {Tengfei Ma},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/tengfeima-ai/Qwen2.5-0.5B-Math-SFT}
}