Upload DPO-Training/DPO-Complete-Guide.md with huggingface_hub

Browse files

Files changed (1) hide show

DPO-Training/DPO-Complete-Guide.md +96 -0

DPO-Training/DPO-Complete-Guide.md ADDED Viewed

	@@ -0,0 +1,96 @@

+# Qwen3-0.6B DPO Training - Complete Setup
+## What's Ready
+### 1. Downloaded DPO Model (LoRA Adapters)
+- **Location**: `/home/ma/models/Qwen3-0.6B-DPO/`
+- **Source**: [AIPlans/Qwen3-0.6B-DPO](https://huggingface.co/AIPlans/Qwen3-0.6B-DPO)
+- **Size**: 8.8 MB (LoRA adapters only)
+- **Status**: ✅ Downloaded
+### 2. Training Scripts Created
+- **train_dpo_qwen3.py** - Main DPO training script
+- **merge_lora.py** - Merge LoRA with base model
+- **merge_dpo_adapters.py** - Merge downloaded DPO adapters
+- **quantize_dpo_model.py** - Quantize to GGUF
+- **sample_preference_data.jsonl** - Example dataset
+- **DPO-Training-README.md** - Documentation
+### 3. Colab Notebook Created
+- **Qwen3_DPO_Training.ipynb** - Ready for Google Colab
+- Free T4 GPU available
+- Complete training pipeline
+## Quick Start Options
+### Option A: Use Existing DPO Model (Fastest)
+```bash
+# Merge LoRA adapters with base model
+python merge_dpo_adapters.py
+# Quantize to GGUF
+python quantize_dpo_model.py --model_path ./Qwen3-0.6B-DPO-merged --quantization Q4_K_S
+```
+### Option B: Train Your Own DPO (On GPU)
+1. **Upload to Google Colab**:
+   - Upload `Qwen3_DPO_Training.ipynb`
+   - Upload `train_dpo_qwen3.py`, `merge_lora.py`, `sample_preference_data.jsonl`
+   - Set Runtime → GPU (T4)
+2. **Run training**:
+   ```python
+   !python train_dpo_qwen3.py --beta 0.1 --epochs 3
+   ```
+3. **Download trained model** from Colab
+### Option C: Train with Custom Data
+1. Create your preference dataset:
+   ```json
+   {"prompt": "Question?", "chosen": "Good answer", "rejected": "Bad answer"}
+   ```
+2. Train:
+   ```bash
+   python train_dpo_qwen3.py --dataset your_data.jsonl --beta 0.1
+   ```
+## DPO Parameters Guide
+| Parameter | Range | Recommendation |
+|-----------|-------|----------------|
+| **Beta (β)** | 0.05-0.2 | Start with 0.1 |
+| **Learning Rate** | 1e-5 to 5e-5 | 2e-5 |
+| **Epochs** | 1-5 | 3 |
+| **LoRA r** | 8-32 | 16 |
+| **LoRA α** | 8-32 | 16 |
+## Files Summary
+```
+/home/ma/models/
+├── Qwen3-0.6B-DPO/              # Downloaded DPO adapters (8.8 MB)
+├── train_dpo_qwen3.py           # Training script
+├── merge_lora.py                # Merge LoRA adapters
+├── merge_dpo_adapters.py        # Merge downloaded adapters
+├── quantize_dpo_model.py        # Quantize to GGUF
+├── sample_preference_data.jsonl # Example dataset
+├── DPO-Training-README.md       # Documentation
+└── Qwen3_DPO_Training.ipynb     # Colab notebook
+```
+## Next Steps
+1. **For immediate use**: Run `merge_dpo_adapters.py` then `quantize_dpo_model.py`
+2. **For custom training**: Use Colab notebook with your data
+3. **For production**: Train on GPU with larger dataset (5000+ examples)
+## References
+- [DPO Paper](https://arxiv.org/abs/2305.18290)
+- [AIPlans DPO Model](https://huggingface.co/AIPlans/Qwen3-0.6B-DPO)
+- [TRL Documentation](https://huggingface.co/docs/trl)