mlopez6132
/

nano-coder-zerogpu

Model card Files Files and versions

xet

Community

mlopez6132 commited on Jul 20, 2025

Commit

db010c6

verified ·

1 Parent(s): 1937046

Upload ZEROGPU_SETUP.md with huggingface_hub

Browse files

Files changed (1) hide show

ZEROGPU_SETUP.md +187 -0

ZEROGPU_SETUP.md ADDED Viewed

	@@ -0,0 +1,187 @@

+# 🚀 ZeroGPU Setup Guide: Free H200 Training
+## 🎯 What is ZeroGPU?
+**ZeroGPU** is Hugging Face's **FREE** compute service that provides:
+- **Nvidia H200 GPU** (70GB memory)
+- **No time limits** (unlike the 4-minute daily limit)
+- **No credit card required**
+- **Perfect for training** nanoGPT models
+## 📊 ZeroGPU vs Previous Approach
+| Feature | Previous (HF Spaces) | ZeroGPU |
+|---------|---------------------|---------|
+| **GPU** | H200 (4 min/day) | H200 (unlimited) |
+| **Memory** | Limited | 70GB |
+| **Time** | 4 minutes daily | No limits |
+| **Cost** | Free | Free |
+| **Use Case** | Demos/Testing | Real Training |
+## 🚀 How to Use ZeroGPU
+### Option 1: Hugging Face Training Cluster (Recommended)
+1. **Create HF Model Repository:**
+   ```bash
+   huggingface-cli repo create nano-coder-zerogpu --type model
+   ```
+2. **Upload Training Files:**
+   ```bash
+   python upload_to_zerogpu.py
+   ```
+3. **Launch ZeroGPU Training:**
+   ```bash
+   python launch_zerogpu.py
+   ```
+### Option 2: Direct ZeroGPU API
+1. **Install HF Hub:**
+   ```bash
+   pip install huggingface_hub
+   ```
+2. **Set HF Token:**
+   ```bash
+   export HF_TOKEN="your_token_here"
+   ```
+3. **Run ZeroGPU Training:**
+   ```bash
+   python zerogpu_training.py
+   ```
+## 📁 Files for ZeroGPU
+- `zerogpu_training.py` - Main training script
+- `upload_to_zerogpu.py` - Upload files to HF
+- `launch_zerogpu.py` - Launch training job
+- `ZEROGPU_SETUP.md` - This guide
+## ⚙️ ZeroGPU Configuration
+### Model Settings (Full Power!)
+- **Layers**: 12 (full model)
+- **Heads**: 12 (full model)
+- **Embedding**: 768 (full model)
+- **Context**: 1024 tokens
+- **Parameters**: ~124M (full GPT-2 size)
+### Training Settings
+- **Batch Size**: 48 (optimized for H200)
+- **Learning Rate**: 6e-4 (standard GPT-2)
+- **Iterations**: 10,000 (no time limits!)
+- **Checkpoints**: Every 1000 iterations
+## 🎯 Expected Results
+With ZeroGPU H200 (no time limits):
+- **Training Time**: 2-4 hours
+- **Final Loss**: ~1.8-2.2
+- **Model Quality**: Production-ready
+- **Code Generation**: High quality Python code
+## 🔧 Setup Steps
+### Step 1: Create HF Repository
+```bash
+huggingface-cli repo create nano-coder-zerogpu --type model
+```
+### Step 2: Prepare Dataset
+```bash
+python prepare_code_dataset.py
+```
+### Step 3: Launch Training
+```bash
+python zerogpu_training.py
+```
+## 📊 Monitoring
+### Wandb Dashboard
+- Real-time training metrics
+- Loss curves
+- Model performance
+### HF Hub
+- Automatic checkpoint uploads
+- Model versioning
+- Training logs
+## 💰 Cost: **$0** (Completely Free!)
+- **No credit card required**
+- **No time limits**
+- **H200 GPU access**
+- **70GB memory**
+## 🎉 Benefits of ZeroGPU
+1. **No Time Limits** - Train for hours, not minutes
+2. **Full Model** - Use complete GPT-2 architecture
+3. **Better Results** - Production-quality models
+4. **Real Training** - Not just demos
+5. **Automatic Saving** - Models saved to HF Hub
+## 🚨 Troubleshooting
+### If Training Won't Start
+1. Check HF token is set
+2. Verify repository exists
+3. Check dataset is prepared
+### If Out of Memory
+1. Reduce batch_size to 32
+2. Reduce gradient_accumulation_steps
+3. Use smaller model (but why?)
+### If Upload Fails
+1. Check internet connection
+2. Verify HF token permissions
+3. Check repository access
+## 🎯 Use Cases
+### Perfect For:
+- ✅ **Production Training** - Real model training
+- ✅ **Research** - Experiment with different configs
+- ✅ **Learning** - Understand full training process
+- ✅ **Model Sharing** - Upload to HF Hub
+### Not Suitable For:
+- ❌ **Quick Demos** - Use HF Spaces for that
+- ❌ **Testing** - Use local GPU for that
+## 🔄 Workflow
+1. **Setup**: Create HF repo and prepare data
+2. **Train**: Launch ZeroGPU training
+3. **Monitor**: Watch progress on Wandb
+4. **Save**: Models automatically uploaded
+5. **Share**: Use trained models
+## 📈 Performance
+Expected training performance on ZeroGPU H200:
+- **Iterations/second**: ~2-3
+- **Memory usage**: ~40-50GB
+- **Training time**: 2-4 hours for 10k iterations
+- **Final model**: Production quality
+## 🎉 Success!
+ZeroGPU is the **proper way** to use Hugging Face's free compute for real training. No more 4-minute limits - train your nano-coder model properly!
+**Next Steps:**
+1. Create HF repository
+2. Upload files
+3. Launch training
+4. Monitor progress
+5. Use your trained model!
+Happy ZeroGPU training! 🚀