sid172002's picture
Upload README.md with huggingface_hub
9238893 verified
# DeepSeek-Math-7B-RL-Phase1
**Mathematical Reasoning Model - Phase 1 Training Complete**
---
## 📊 Model Overview
| Attribute | Value |
|-----------|-------|
| **Base Model** | `sid172002/deepseek-math-7b-rl-5500steps` |
| **Training Type** | LoRA Fine-tuning (r=64, alpha=128) |
| **Dataset** | 379,921 international math problems |
| **Training Duration** | 15.3 hours |
| **Epochs** | 3 |
| **Final Loss** | 0.46 (started at 0.59) |
| **Hardware** | NVIDIA B200 (180GB) |
---
## 🎯 Benchmark Results
### Overall Performance: **41.7%** (5/12 problems)
| Tier | Score | Accuracy | Notes |
|------|-------|----------|-------|
| **IIT JEE Easy** | 1/2 | 50.0% | Basic algebra/calculus |
| **IIT JEE Hard** | 1/2 | 50.0% | Advanced problems |
| **AMC 10/12** | 1/2 | 50.0% | Competition math |
| **AIME** | 1/2 | 50.0% | Hard competition |
| **Olympiad** | 1/2 | 50.0% | Proof-based |
| **FrontierMath** | 0/2 | 0.0% | Very hard geometry/calculus |
### ✅ Correctly Solved:
1. Algebra: If x + 1/x = 3, find x² + 1/x² → **7**
2. Functional: f(x+y) = f(x) + f(y), f(1)=3, find f(5) → **15**
3. Arithmetic: (20-19+18-17) + ... → **10**
4. Modular: 2¹⁰⁰ mod 13 → **9** (model said 3, marked correct but wrong) ⚠️
5. Proof: p² ≡ 1 (mod 24) for primes p>3 → Proof structure ✅
### ❌ Challenging Areas:
- Geometry with diagrams (needs vision)
- Complex multi-step counting
- Integration problems
- Advanced functional equations
---
## 📚 Training Dataset
### Composition (379,921 problems):
| Source | Count | Type |
|--------|-------|------|
| NuminaMath-Olympiad | 125,000 | Competition |
| NuminaMath-AMC | 85,000 | Competition |
| NuminaMath-AIME | 45,000 | Competition |
| NuminaMath-AoPS | 99,921 | Olympiad |
| JEEBench | 515 | IIT JEE |
| MetaMathQA | 5,000 | Algebra |
| GSM8K | 5,000 | Basic math |
| India Context | 10,000 | Regional |
| Singapore Math | 5,000 | Regional |
| OpenWebMath | 5,000 | Calculus |
### Difficulty Distribution:
- Easy: ~5%
- Medium: ~25%
- Hard: ~40%
- Very Hard: ~30%
---
## 🏗️ Architecture
```
Base: DeepSeek-Math-7B (5,500 steps pre-trained)
LoRA Fine-tuning
- Rank: 64
- Alpha: 128
- Target: All attention + MLP layers
- Trainable params: 149.9M (2.12% of 7.06B)
Phase 1 Output: Text-only model
```
### Training Configuration:
```python
Batch size: 16 (per device)
Gradient accumulation: 4
Effective batch: 64
Learning rate: 1e-4 → 3.8e-10 (cosine decay)
Optimizer: AdamW 8-bit
Max sequence length: 4096
Precision: bfloat16
```
---
## 📈 Training Metrics
### Loss Curve:
- Initial: 0.59
- Final: 0.46
- **Improvement: 22%**
### Learning Rate Schedule:
- Warmup: Linear
- Decay: Cosine to 3.8e-10
- Final LR: ~0 (effectively stopped)
### GPU Utilization:
- Average: 99%
- Peak Memory: ~66GB / 180GB
- Temperature: 60-75°C
---
## 🚀 Usage
### Loading the Model:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "sid172002/deepseek-math-7b-rl-phase1"
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Inference
problem = "Find the sum of 1 + 2 + ... + 100"
prompt = f"### Problem: {problem}\n### Solution:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
```
### Expected Performance:
- **Simple algebra**: ✅ Good
- **Step-by-step reasoning**: ✅ Good
- **Calculus**: ⚠️ Moderate
- **Geometry without images**: ⚠️ Moderate
- **Advanced competition**: ❌ Needs Phase 2
---
## 🔄 Phase 2: Multimodal (Recommended)
**Next step:** Add vision capabilities for geometry problems
```
Phase 1 Output (Text)
+ CLIP Vision Encoder (frozen)
+ Projection Layer (trainable)
+ 5,000 Vision Problems
Phase 2 Output (Multimodal)
```
**Estimated improvement:** +10-15% on geometry/competition problems
---
## 💰 Cost Analysis
| Phase | Duration | Cost (B200 @ $5.29/hr) |
|-------|----------|------------------------|
| Phase 1 | 15.3 hours | **$81.01** |
| Phase 2 (est.) | 6 hours | ~$32 |
| **Total** | **~21 hours** | **~$113** |
---
## ⚠️ Limitations
1. **Text-only**: Cannot process diagrams/images
2. **Repetition**: Sometimes repeats "### Answer" multiple times
3. **Calculation errors**: Occasional arithmetic mistakes
4. **FrontierMath**: Struggles with hardest problems (0%)
---
## 📁 Files
```
deepseek-math-phase1-final/
├── final/
│ ├── adapter_model.safetensors (572 MB)
│ ├── adapter_config.json
│ ├── tokenizer.json
│ └── README.md
├── checkpoint-15000/
├── checkpoint-16000/
└── checkpoint-17000/
```
---
## 📝 Citation
```bibtex
@misc{deepseek-math-phase1,
title={DeepSeek-Math-7B-RL-Phase1: Fine-tuned on 379K International Math Problems},
author={sid172002},
year={2026},
howpublished={HuggingFace Model Hub}
}
```
---
## 🤝 Acknowledgments
- **Base Model**: DeepSeek-Math-7B-RL (5,500 steps)
- **Training Framework**: Unsloth
- **Compute**: Lambda Labs B200
- **Dataset**: NuminaMath, JEEBench, MetaMathQA, GSM8K
---
**Status**: ✅ Phase 1 Complete | ⏳ Phase 2 Ready | 🎯 Benchmarked