You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen2.5-0.5B-Math-SFT

Supervised Fine-Tuned version of Qwen/Qwen2.5-0.5B on 32,774 high-quality mathematical reasoning samples from DeepMath-103K, with DeepSeek-R1-generated chain-of-thought solutions as training targets.

This is Stage B of the AIMS5740 Final Project pipeline on Data Selection + RL for LLMs (Math/STEM). The GRPO-trained successor is at tengfeima-ai/Qwen2.5-0.5B-Math-GRPO.


🔁 3-Stage Training Pipeline

Stage A ─ Data Selection & Filtering
  DeepMath-103K (103,022 raw samples)
      ↓  difficulty ≥ 3/10, length filters, valid answer check
  32,774 curated samples  (33.5% retention)

Stage B ─ Supervised Fine-Tuning  ← THIS MODEL
  Base: Qwen/Qwen2.5-0.5B
      ↓  3 epochs · 2×H100 SXM · DeepSpeed ZeRO-2 · Flash Attn 2
  Qwen2.5-0.5B-Math-SFT

Stage C ─ GRPO Reinforcement Learning
  Qwen2.5-0.5B-Math-SFT
      ↓  reward = correctness + format + length_penalty
  Qwen2.5-0.5B-Math-GRPO

Inspired by DeepSeek-R1: imitate R1 CoT via SFT first, then refine with outcome-based RL rewards.


🏆 Evaluation Results

Benchmark Base Model This Model (SFT) Δ
MATH-500 nan% nan%
GSM8K nan% nan%
MMLU-STEM N/A% N/A%

Evaluation conducted with lm-evaluation-harness. Results for GRPO model: see tengfeima-ai/Qwen2.5-0.5B-Math-GRPO.


🗂️ Training Data — DeepMath-103K (Filtered)

Property Value
Source zwhe99/DeepMath-103K
Raw samples 103,022
After filtering 32,774 (33.5% retention)
Main rejection cause R1 solutions > 2048 words (52,778 samples)
Solution type DeepSeek-R1 chain-of-thought (r1_solution_1/2/3)
Topics Competition math, algebra, number theory, combinatorics, calculus

Stage A filter criteria:

  • Difficulty score ≥ 3.0 (DeepMath native score, scale 1–10)
  • Solution word count: 50 – 2048 words
  • Non-empty final_answer field
  • Best of 3 R1 solutions selected by length heuristic

Training format (Alpaca-style):

{
  "instruction": "Solve the following math problem step by step.",
  "input": "<problem statement>",
  "output": "<R1-style CoT reasoning>\n\nThe answer is: <final_answer>"
}

📊 Training Metrics

Metric Value
Final train loss 0.6287
Final eval loss 0.6340
Total epochs 3
Total optimizer steps 1,521
Training time 40.5 minutes
Throughput 40.1 samples/sec
Total FLOPs 4.28e+17
Final learning rate ~9.5e-11 (cosine decay to ~0)

Loss curve: decreased from ~1.2 (step 1) → ~0.57 (step 1520), indicating good convergence without overfitting (eval loss tracked train loss closely throughout).


⚙️ Training Configuration

Parameter Value
Base model Qwen/Qwen2.5-0.5B
Fine-tuning method Full fine-tuning (no LoRA/PEFT)
Framework LLaMA-Factory v0.9+
Hardware 2× NVIDIA H100 SXM 80GB HBM3 (NVLink 4.0)
Multi-GPU DeepSpeed ZeRO Stage 2
Precision bfloat16
Attention Flash Attention 2
Per-device batch size 4
Gradient accumulation steps 8
Effective global batch size 64 (4 × 8 × 2 GPUs)
Optimizer AdamW (β₁=0.9, β₂=0.999)
Learning rate 1e-5
LR scheduler Cosine with warmup
Warmup ratio 0.03
Weight decay 0.01
Max gradient norm 1.0
Max sequence length 2048 tokens
Gradient checkpointing Enabled (saves ~30% VRAM)
Peak GPU memory ~26 GB / 80 GB per H100
Training date 2025-03-28

💬 Prompt Format

This model uses the Qwen chat template. For best results:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("tengfeima-ai/Qwen2.5-0.5B-Math-SFT")
tokenizer = AutoTokenizer.from_pretrained("tengfeima-ai/Qwen2.5-0.5B-Math-SFT")

messages = [
    {"role": "system", "content": "You are a math expert. Think step by step and end with the final answer in \\boxed{}."},
    {"role": "user", "content": "Solve: What is the sum of all integers from 1 to 100?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.0)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

📚 Related Work

📄 Citation

@misc{tengfeima2026qwen25mathsft,
  title     = {Qwen2.5-0.5B-Math-SFT: Supervised Fine-Tuning for Math Reasoning},
  author    = {Tengfei Ma},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/tengfeima-ai/Qwen2.5-0.5B-Math-SFT}
}
Downloads last month
460
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tengfeima-ai/Qwen2.5-0.5B-Math-SFT

Finetuned
(667)
this model
Finetunes
1 model

Dataset used to train tengfeima-ai/Qwen2.5-0.5B-Math-SFT

Paper for tengfeima-ai/Qwen2.5-0.5B-Math-SFT