Upload ZEROGPU_SETUP.md with huggingface_hub
Browse files- ZEROGPU_SETUP.md +187 -0
ZEROGPU_SETUP.md
ADDED
|
@@ -0,0 +1,187 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π ZeroGPU Setup Guide: Free H200 Training
|
| 2 |
+
|
| 3 |
+
## π― What is ZeroGPU?
|
| 4 |
+
|
| 5 |
+
**ZeroGPU** is Hugging Face's **FREE** compute service that provides:
|
| 6 |
+
- **Nvidia H200 GPU** (70GB memory)
|
| 7 |
+
- **No time limits** (unlike the 4-minute daily limit)
|
| 8 |
+
- **No credit card required**
|
| 9 |
+
- **Perfect for training** nanoGPT models
|
| 10 |
+
|
| 11 |
+
## π ZeroGPU vs Previous Approach
|
| 12 |
+
|
| 13 |
+
| Feature | Previous (HF Spaces) | ZeroGPU |
|
| 14 |
+
|---------|---------------------|---------|
|
| 15 |
+
| **GPU** | H200 (4 min/day) | H200 (unlimited) |
|
| 16 |
+
| **Memory** | Limited | 70GB |
|
| 17 |
+
| **Time** | 4 minutes daily | No limits |
|
| 18 |
+
| **Cost** | Free | Free |
|
| 19 |
+
| **Use Case** | Demos/Testing | Real Training |
|
| 20 |
+
|
| 21 |
+
## π How to Use ZeroGPU
|
| 22 |
+
|
| 23 |
+
### Option 1: Hugging Face Training Cluster (Recommended)
|
| 24 |
+
|
| 25 |
+
1. **Create HF Model Repository:**
|
| 26 |
+
```bash
|
| 27 |
+
huggingface-cli repo create nano-coder-zerogpu --type model
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
2. **Upload Training Files:**
|
| 31 |
+
```bash
|
| 32 |
+
python upload_to_zerogpu.py
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
3. **Launch ZeroGPU Training:**
|
| 36 |
+
```bash
|
| 37 |
+
python launch_zerogpu.py
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
### Option 2: Direct ZeroGPU API
|
| 41 |
+
|
| 42 |
+
1. **Install HF Hub:**
|
| 43 |
+
```bash
|
| 44 |
+
pip install huggingface_hub
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
2. **Set HF Token:**
|
| 48 |
+
```bash
|
| 49 |
+
export HF_TOKEN="your_token_here"
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
3. **Run ZeroGPU Training:**
|
| 53 |
+
```bash
|
| 54 |
+
python zerogpu_training.py
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## π Files for ZeroGPU
|
| 58 |
+
|
| 59 |
+
- `zerogpu_training.py` - Main training script
|
| 60 |
+
- `upload_to_zerogpu.py` - Upload files to HF
|
| 61 |
+
- `launch_zerogpu.py` - Launch training job
|
| 62 |
+
- `ZEROGPU_SETUP.md` - This guide
|
| 63 |
+
|
| 64 |
+
## βοΈ ZeroGPU Configuration
|
| 65 |
+
|
| 66 |
+
### Model Settings (Full Power!)
|
| 67 |
+
- **Layers**: 12 (full model)
|
| 68 |
+
- **Heads**: 12 (full model)
|
| 69 |
+
- **Embedding**: 768 (full model)
|
| 70 |
+
- **Context**: 1024 tokens
|
| 71 |
+
- **Parameters**: ~124M (full GPT-2 size)
|
| 72 |
+
|
| 73 |
+
### Training Settings
|
| 74 |
+
- **Batch Size**: 48 (optimized for H200)
|
| 75 |
+
- **Learning Rate**: 6e-4 (standard GPT-2)
|
| 76 |
+
- **Iterations**: 10,000 (no time limits!)
|
| 77 |
+
- **Checkpoints**: Every 1000 iterations
|
| 78 |
+
|
| 79 |
+
## π― Expected Results
|
| 80 |
+
|
| 81 |
+
With ZeroGPU H200 (no time limits):
|
| 82 |
+
- **Training Time**: 2-4 hours
|
| 83 |
+
- **Final Loss**: ~1.8-2.2
|
| 84 |
+
- **Model Quality**: Production-ready
|
| 85 |
+
- **Code Generation**: High quality Python code
|
| 86 |
+
|
| 87 |
+
## π§ Setup Steps
|
| 88 |
+
|
| 89 |
+
### Step 1: Create HF Repository
|
| 90 |
+
```bash
|
| 91 |
+
huggingface-cli repo create nano-coder-zerogpu --type model
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
### Step 2: Prepare Dataset
|
| 95 |
+
```bash
|
| 96 |
+
python prepare_code_dataset.py
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
### Step 3: Launch Training
|
| 100 |
+
```bash
|
| 101 |
+
python zerogpu_training.py
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
## π Monitoring
|
| 105 |
+
|
| 106 |
+
### Wandb Dashboard
|
| 107 |
+
- Real-time training metrics
|
| 108 |
+
- Loss curves
|
| 109 |
+
- Model performance
|
| 110 |
+
|
| 111 |
+
### HF Hub
|
| 112 |
+
- Automatic checkpoint uploads
|
| 113 |
+
- Model versioning
|
| 114 |
+
- Training logs
|
| 115 |
+
|
| 116 |
+
## π° Cost: **$0** (Completely Free!)
|
| 117 |
+
|
| 118 |
+
- **No credit card required**
|
| 119 |
+
- **No time limits**
|
| 120 |
+
- **H200 GPU access**
|
| 121 |
+
- **70GB memory**
|
| 122 |
+
|
| 123 |
+
## π Benefits of ZeroGPU
|
| 124 |
+
|
| 125 |
+
1. **No Time Limits** - Train for hours, not minutes
|
| 126 |
+
2. **Full Model** - Use complete GPT-2 architecture
|
| 127 |
+
3. **Better Results** - Production-quality models
|
| 128 |
+
4. **Real Training** - Not just demos
|
| 129 |
+
5. **Automatic Saving** - Models saved to HF Hub
|
| 130 |
+
|
| 131 |
+
## π¨ Troubleshooting
|
| 132 |
+
|
| 133 |
+
### If Training Won't Start
|
| 134 |
+
1. Check HF token is set
|
| 135 |
+
2. Verify repository exists
|
| 136 |
+
3. Check dataset is prepared
|
| 137 |
+
|
| 138 |
+
### If Out of Memory
|
| 139 |
+
1. Reduce batch_size to 32
|
| 140 |
+
2. Reduce gradient_accumulation_steps
|
| 141 |
+
3. Use smaller model (but why?)
|
| 142 |
+
|
| 143 |
+
### If Upload Fails
|
| 144 |
+
1. Check internet connection
|
| 145 |
+
2. Verify HF token permissions
|
| 146 |
+
3. Check repository access
|
| 147 |
+
|
| 148 |
+
## π― Use Cases
|
| 149 |
+
|
| 150 |
+
### Perfect For:
|
| 151 |
+
- β
**Production Training** - Real model training
|
| 152 |
+
- β
**Research** - Experiment with different configs
|
| 153 |
+
- β
**Learning** - Understand full training process
|
| 154 |
+
- β
**Model Sharing** - Upload to HF Hub
|
| 155 |
+
|
| 156 |
+
### Not Suitable For:
|
| 157 |
+
- β **Quick Demos** - Use HF Spaces for that
|
| 158 |
+
- β **Testing** - Use local GPU for that
|
| 159 |
+
|
| 160 |
+
## π Workflow
|
| 161 |
+
|
| 162 |
+
1. **Setup**: Create HF repo and prepare data
|
| 163 |
+
2. **Train**: Launch ZeroGPU training
|
| 164 |
+
3. **Monitor**: Watch progress on Wandb
|
| 165 |
+
4. **Save**: Models automatically uploaded
|
| 166 |
+
5. **Share**: Use trained models
|
| 167 |
+
|
| 168 |
+
## π Performance
|
| 169 |
+
|
| 170 |
+
Expected training performance on ZeroGPU H200:
|
| 171 |
+
- **Iterations/second**: ~2-3
|
| 172 |
+
- **Memory usage**: ~40-50GB
|
| 173 |
+
- **Training time**: 2-4 hours for 10k iterations
|
| 174 |
+
- **Final model**: Production quality
|
| 175 |
+
|
| 176 |
+
## π Success!
|
| 177 |
+
|
| 178 |
+
ZeroGPU is the **proper way** to use Hugging Face's free compute for real training. No more 4-minute limits - train your nano-coder model properly!
|
| 179 |
+
|
| 180 |
+
**Next Steps:**
|
| 181 |
+
1. Create HF repository
|
| 182 |
+
2. Upload files
|
| 183 |
+
3. Launch training
|
| 184 |
+
4. Monitor progress
|
| 185 |
+
5. Use your trained model!
|
| 186 |
+
|
| 187 |
+
Happy ZeroGPU training! π
|