Qwen3-1.7B-SFT-DeepMath

Model Details

Qwen3-1.7B Supervised Fine-Tuned on DeepMath-103K human-written solutions.

This model serves as baseline for comparison with GOPD/GRPO reinforcement learning approach.

Training

Component Details
Base Model Qwen/Qwen3-1.7B
Algorithm Supervised Fine-Tuning (SFT)
Dataset zwhe99/DeepMath-103K
Filter difficulty ≥ 6 (Olympiad level)
Samples 8,000 problems
Epochs 3
Batch Size 256
Learning Rate 1e-5

Performance

Model AIME 2024 AIME 2025 Total
Baseline Qwen3-1.7B 33.33% 90.00% 61.67%
SFT (This Model) 23.33% 66.67% 45.00%
GOPD/GRPO TBD TBD TBD

Key Finding

SFT resulted in 16.67% performance drop compared to baseline. This demonstrates that simply memorizing human solution patterns does not improve mathematical reasoning ability, and may cause catastrophic forgetting.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("jindun/Qwen3-1.7B-SFT-DeepMath", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("jindun/Qwen3-1.7B-SFT-DeepMath")

License

Apache 2.0

Downloads last month
5
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jindun/Qwen3-1.7B-SFT-DeepMath

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(766)
this model

Dataset used to train jindun/Qwen3-1.7B-SFT-DeepMath