File size: 5,379 Bytes
9238893
c76302c
9238893
c76302c
9238893
c76302c
9238893
c76302c
9238893
 
 
 
 
 
 
 
 
c76302c
9238893
c76302c
9238893
c76302c
9238893
c76302c
9238893
 
 
 
 
 
 
 
c76302c
9238893
 
 
 
 
 
c76302c
9238893
 
 
 
 
c76302c
9238893
c76302c
9238893
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c76302c
9238893
c76302c
9238893
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c76302c
9238893
c76302c
9238893
c76302c
9238893
 
 
 
c76302c
9238893
 
 
 
c76302c
9238893
 
 
 
c76302c
9238893
c76302c
9238893
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c76302c
9238893
c76302c
9238893
c76302c
9238893
c76302c
9238893
 
 
 
 
 
 
 
 
c76302c
9238893
c76302c
9238893
c76302c
9238893
c76302c
9238893
 
 
 
 
c76302c
9238893
c76302c
9238893
c76302c
9238893
 
 
 
c76302c
9238893
c76302c
9238893
c76302c
9238893
 
 
 
 
 
 
 
 
 
 
c76302c
9238893
c76302c
9238893
c76302c
9238893
 
 
 
 
 
 
 
c76302c
9238893
c76302c
9238893
c76302c
9238893
 
 
 
c76302c
9238893
c76302c
9238893
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
# DeepSeek-Math-7B-RL-Phase1

**Mathematical Reasoning Model - Phase 1 Training Complete**

---

## 📊 Model Overview

| Attribute | Value |
|-----------|-------|
| **Base Model** | `sid172002/deepseek-math-7b-rl-5500steps` |
| **Training Type** | LoRA Fine-tuning (r=64, alpha=128) |
| **Dataset** | 379,921 international math problems |
| **Training Duration** | 15.3 hours |
| **Epochs** | 3 |
| **Final Loss** | 0.46 (started at 0.59) |
| **Hardware** | NVIDIA B200 (180GB) |

---

## 🎯 Benchmark Results

### Overall Performance: **41.7%** (5/12 problems)

| Tier | Score | Accuracy | Notes |
|------|-------|----------|-------|
| **IIT JEE Easy** | 1/2 | 50.0% | Basic algebra/calculus |
| **IIT JEE Hard** | 1/2 | 50.0% | Advanced problems |
| **AMC 10/12** | 1/2 | 50.0% | Competition math |
| **AIME** | 1/2 | 50.0% | Hard competition |
| **Olympiad** | 1/2 | 50.0% | Proof-based |
| **FrontierMath** | 0/2 | 0.0% | Very hard geometry/calculus |

### ✅ Correctly Solved:
1. Algebra: If x + 1/x = 3, find x² + 1/x² → **7**2. Functional: f(x+y) = f(x) + f(y), f(1)=3, find f(5) → **15**3. Arithmetic: (20-19+18-17) + ... → **10**4. Modular: 2¹⁰⁰ mod 13 → **9** (model said 3, marked correct but wrong) ⚠️
5. Proof: p² ≡ 1 (mod 24) for primes p>3 → Proof structure ✅

### ❌ Challenging Areas:
- Geometry with diagrams (needs vision)
- Complex multi-step counting
- Integration problems
- Advanced functional equations

---

## 📚 Training Dataset

### Composition (379,921 problems):

| Source | Count | Type |
|--------|-------|------|
| NuminaMath-Olympiad | 125,000 | Competition |
| NuminaMath-AMC | 85,000 | Competition |
| NuminaMath-AIME | 45,000 | Competition |
| NuminaMath-AoPS | 99,921 | Olympiad |
| JEEBench | 515 | IIT JEE |
| MetaMathQA | 5,000 | Algebra |
| GSM8K | 5,000 | Basic math |
| India Context | 10,000 | Regional |
| Singapore Math | 5,000 | Regional |
| OpenWebMath | 5,000 | Calculus |

### Difficulty Distribution:
- Easy: ~5%
- Medium: ~25%
- Hard: ~40%
- Very Hard: ~30%

---

## 🏗️ Architecture

```
Base: DeepSeek-Math-7B (5,500 steps pre-trained)

LoRA Fine-tuning
  - Rank: 64
  - Alpha: 128
  - Target: All attention + MLP layers
  - Trainable params: 149.9M (2.12% of 7.06B)

Phase 1 Output: Text-only model
```

### Training Configuration:
```python
Batch size: 16 (per device)
Gradient accumulation: 4
Effective batch: 64
Learning rate: 1e-4 → 3.8e-10 (cosine decay)
Optimizer: AdamW 8-bit
Max sequence length: 4096
Precision: bfloat16
```

---

## 📈 Training Metrics

### Loss Curve:
- Initial: 0.59
- Final: 0.46
- **Improvement: 22%**

### Learning Rate Schedule:
- Warmup: Linear
- Decay: Cosine to 3.8e-10
- Final LR: ~0 (effectively stopped)

### GPU Utilization:
- Average: 99%
- Peak Memory: ~66GB / 180GB
- Temperature: 60-75°C

---

## 🚀 Usage

### Loading the Model:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "sid172002/deepseek-math-7b-rl-phase1"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Inference
problem = "Find the sum of 1 + 2 + ... + 100"
prompt = f"### Problem: {problem}\n### Solution:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

### Expected Performance:
- **Simple algebra**: ✅ Good
- **Step-by-step reasoning**: ✅ Good
- **Calculus**: ⚠️ Moderate
- **Geometry without images**: ⚠️ Moderate
- **Advanced competition**: ❌ Needs Phase 2

---

## 🔄 Phase 2: Multimodal (Recommended)

**Next step:** Add vision capabilities for geometry problems

```
Phase 1 Output (Text)

+ CLIP Vision Encoder (frozen)
+ Projection Layer (trainable)
+ 5,000 Vision Problems

Phase 2 Output (Multimodal)
```

**Estimated improvement:** +10-15% on geometry/competition problems

---

## 💰 Cost Analysis

| Phase | Duration | Cost (B200 @ $5.29/hr) |
|-------|----------|------------------------|
| Phase 1 | 15.3 hours | **$81.01** |
| Phase 2 (est.) | 6 hours | ~$32 |
| **Total** | **~21 hours** | **~$113** |

---

## ⚠️ Limitations

1. **Text-only**: Cannot process diagrams/images
2. **Repetition**: Sometimes repeats "### Answer" multiple times
3. **Calculation errors**: Occasional arithmetic mistakes
4. **FrontierMath**: Struggles with hardest problems (0%)

---

## 📁 Files

```
deepseek-math-phase1-final/
├── final/
│   ├── adapter_model.safetensors (572 MB)
│   ├── adapter_config.json
│   ├── tokenizer.json
│   └── README.md
├── checkpoint-15000/
├── checkpoint-16000/
└── checkpoint-17000/
```

---

## 📝 Citation

```bibtex
@misc{deepseek-math-phase1,
  title={DeepSeek-Math-7B-RL-Phase1: Fine-tuned on 379K International Math Problems},
  author={sid172002},
  year={2026},
  howpublished={HuggingFace Model Hub}
}
```

---

## 🤝 Acknowledgments

- **Base Model**: DeepSeek-Math-7B-RL (5,500 steps)
- **Training Framework**: Unsloth
- **Compute**: Lambda Labs B200
- **Dataset**: NuminaMath, JEEBench, MetaMathQA, GSM8K

---

**Status**: ✅ Phase 1 Complete | ⏳ Phase 2 Ready | 🎯 Benchmarked