zwhe99/DeepMath-103K
Viewer • Updated • 103k • 6.28k • 362
Qwen3-1.7B Supervised Fine-Tuned on DeepMath-103K human-written solutions.
This model serves as baseline for comparison with GOPD/GRPO reinforcement learning approach.
| Component | Details |
|---|---|
| Base Model | Qwen/Qwen3-1.7B |
| Algorithm | Supervised Fine-Tuning (SFT) |
| Dataset | zwhe99/DeepMath-103K |
| Filter | difficulty ≥ 6 (Olympiad level) |
| Samples | 8,000 problems |
| Epochs | 3 |
| Batch Size | 256 |
| Learning Rate | 1e-5 |
| Model | AIME 2024 | AIME 2025 | Total |
|---|---|---|---|
| Baseline Qwen3-1.7B | 33.33% | 90.00% | 61.67% |
| SFT (This Model) | 23.33% | 66.67% | 45.00% |
| GOPD/GRPO | TBD | TBD | TBD |
SFT resulted in 16.67% performance drop compared to baseline. This demonstrates that simply memorizing human solution patterns does not improve mathematical reasoning ability, and may cause catastrophic forgetting.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("jindun/Qwen3-1.7B-SFT-DeepMath", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("jindun/Qwen3-1.7B-SFT-DeepMath")
Apache 2.0