Qwen3-1.7B-SFT-DeepMath

Model Details

Qwen3-1.7B Supervised Fine-Tuned on DeepMath-103K human-written solutions.

This model serves as baseline for comparison with GOPD/GRPO reinforcement learning approach.

Training

Component	Details
Base Model	Qwen/Qwen3-1.7B
Algorithm	Supervised Fine-Tuning (SFT)
Dataset	zwhe99/DeepMath-103K
Filter	difficulty ≥ 6 (Olympiad level)
Samples	8,000 problems
Epochs	3
Batch Size	256
Learning Rate	1e-5

Performance

Model	AIME 2024	AIME 2025	Total
Baseline Qwen3-1.7B	33.33%	90.00%	61.67%
SFT (This Model)	23.33%	66.67%	45.00%
GOPD/GRPO	TBD	TBD	TBD

Key Finding

SFT resulted in 16.67% performance drop compared to baseline. This demonstrates that simply memorizing human solution patterns does not improve mathematical reasoning ability, and may cause catastrophic forgetting.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("jindun/Qwen3-1.7B-SFT-DeepMath", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("jindun/Qwen3-1.7B-SFT-DeepMath")

License

Apache 2.0

Downloads last month: 4

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jindun/Qwen3-1.7B-SFT-DeepMath

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(910)

this model

jindun
/

Qwen3-1.7B-SFT-DeepMath