DeepSeek-Math-7B-RL-Phase1

Mathematical Reasoning Model - Phase 1 Training Complete

📊 Model Overview

Attribute	Value
Base Model	`sid172002/deepseek-math-7b-rl-5500steps`
Training Type	LoRA Fine-tuning (r=64, alpha=128)
Dataset	379,921 international math problems
Training Duration	15.3 hours
Epochs	3
Final Loss	0.46 (started at 0.59)
Hardware	NVIDIA B200 (180GB)

🎯 Benchmark Results

Overall Performance: 41.7% (5/12 problems)

Tier	Score	Accuracy	Notes
IIT JEE Easy	1/2	50.0%	Basic algebra/calculus
IIT JEE Hard	1/2	50.0%	Advanced problems
AMC 10/12	1/2	50.0%	Competition math
AIME	1/2	50.0%	Hard competition
Olympiad	1/2	50.0%	Proof-based
FrontierMath	0/2	0.0%	Very hard geometry/calculus

✅ Correctly Solved:

Algebra: If x + 1/x = 3, find x² + 1/x² → 7 ✅
Functional: f(x+y) = f(x) + f(y), f(1)=3, find f(5) → 15 ✅
Arithmetic: (20-19+18-17) + ... → 10 ✅
Modular: 2¹⁰⁰ mod 13 → 9 (model said 3, marked correct but wrong) ⚠️
Proof: p² ≡ 1 (mod 24) for primes p>3 → Proof structure ✅

❌ Challenging Areas:

Geometry with diagrams (needs vision)
Complex multi-step counting
Integration problems
Advanced functional equations

📚 Training Dataset

Composition (379,921 problems):

Source	Count	Type
NuminaMath-Olympiad	125,000	Competition
NuminaMath-AMC	85,000	Competition
NuminaMath-AIME	45,000	Competition
NuminaMath-AoPS	99,921	Olympiad
JEEBench	515	IIT JEE
MetaMathQA	5,000	Algebra
GSM8K	5,000	Basic math
India Context	10,000	Regional
Singapore Math	5,000	Regional
OpenWebMath	5,000	Calculus

Difficulty Distribution:

Easy: ~5%
Medium: ~25%
Hard: ~40%
Very Hard: ~30%

🏗️ Architecture

Base: DeepSeek-Math-7B (5,500 steps pre-trained)
  ↓
LoRA Fine-tuning
  - Rank: 64
  - Alpha: 128
  - Target: All attention + MLP layers
  - Trainable params: 149.9M (2.12% of 7.06B)
  ↓
Phase 1 Output: Text-only model

Training Configuration:

Batch size: 16 (per device)
Gradient accumulation: 4
Effective batch: 64
Learning rate: 1e-4 → 3.8e-10 (cosine decay)
Optimizer: AdamW 8-bit
Max sequence length: 4096
Precision: bfloat16

📈 Training Metrics

Loss Curve:

Initial: 0.59
Final: 0.46
Improvement: 22%

Learning Rate Schedule:

Warmup: Linear
Decay: Cosine to 3.8e-10
Final LR: ~0 (effectively stopped)

GPU Utilization:

Average: 99%
Peak Memory: ~66GB / 180GB
Temperature: 60-75°C

🚀 Usage

Loading the Model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "sid172002/deepseek-math-7b-rl-phase1"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Inference
problem = "Find the sum of 1 + 2 + ... + 100"
prompt = f"### Problem: {problem}\n### Solution:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Expected Performance:

Simple algebra: ✅ Good
Step-by-step reasoning: ✅ Good
Calculus: ⚠️ Moderate
Geometry without images: ⚠️ Moderate
Advanced competition: ❌ Needs Phase 2

🔄 Phase 2: Multimodal (Recommended)

Next step: Add vision capabilities for geometry problems

Phase 1 Output (Text)
  ↓
+ CLIP Vision Encoder (frozen)
+ Projection Layer (trainable)
+ 5,000 Vision Problems
  ↓
Phase 2 Output (Multimodal)

Estimated improvement: +10-15% on geometry/competition problems

💰 Cost Analysis

Phase	Duration	Cost (B200 @ $5.29/hr)
Phase 1	15.3 hours	$81.01
Phase 2 (est.)	6 hours	~$32
Total	~21 hours	~$113

⚠️ Limitations

Text-only: Cannot process diagrams/images
Repetition: Sometimes repeats "### Answer" multiple times
Calculation errors: Occasional arithmetic mistakes
FrontierMath: Struggles with hardest problems (0%)

📁 Files

deepseek-math-phase1-final/
├── final/
│   ├── adapter_model.safetensors (572 MB)
│   ├── adapter_config.json
│   ├── tokenizer.json
│   └── README.md
├── checkpoint-15000/
├── checkpoint-16000/
└── checkpoint-17000/

📝 Citation

@misc{deepseek-math-phase1,
  title={DeepSeek-Math-7B-RL-Phase1: Fine-tuned on 379K International Math Problems},
  author={sid172002},
  year={2026},
  howpublished={HuggingFace Model Hub}
}

🤝 Acknowledgments

Base Model: DeepSeek-Math-7B-RL (5,500 steps)
Training Framework: Unsloth
Compute: Lambda Labs B200
Dataset: NuminaMath, JEEBench, MetaMathQA, GSM8K

Status: ✅ Phase 1 Complete | ⏳ Phase 2 Ready | 🎯 Benchmarked