Qwen3-0.6B DPO Training - Complete Setup
What's Ready
1. Downloaded DPO Model (LoRA Adapters)
- Location:
/home/ma/models/Qwen3-0.6B-DPO/ - Source: AIPlans/Qwen3-0.6B-DPO
- Size: 8.8 MB (LoRA adapters only)
- Status: β Downloaded
2. Training Scripts Created
- train_dpo_qwen3.py - Main DPO training script
- merge_lora.py - Merge LoRA with base model
- merge_dpo_adapters.py - Merge downloaded DPO adapters
- quantize_dpo_model.py - Quantize to GGUF
- sample_preference_data.jsonl - Example dataset
- DPO-Training-README.md - Documentation
3. Colab Notebook Created
- Qwen3_DPO_Training.ipynb - Ready for Google Colab
- Free T4 GPU available
- Complete training pipeline
Quick Start Options
Option A: Use Existing DPO Model (Fastest)
# Merge LoRA adapters with base model
python merge_dpo_adapters.py
# Quantize to GGUF
python quantize_dpo_model.py --model_path ./Qwen3-0.6B-DPO-merged --quantization Q4_K_S
Option B: Train Your Own DPO (On GPU)
Upload to Google Colab:
- Upload
Qwen3_DPO_Training.ipynb - Upload
train_dpo_qwen3.py,merge_lora.py,sample_preference_data.jsonl - Set Runtime β GPU (T4)
- Upload
Run training:
!python train_dpo_qwen3.py --beta 0.1 --epochs 3Download trained model from Colab
Option C: Train with Custom Data
Create your preference dataset:
{"prompt": "Question?", "chosen": "Good answer", "rejected": "Bad answer"}Train:
python train_dpo_qwen3.py --dataset your_data.jsonl --beta 0.1
DPO Parameters Guide
| Parameter | Range | Recommendation |
|---|---|---|
| Beta (Ξ²) | 0.05-0.2 | Start with 0.1 |
| Learning Rate | 1e-5 to 5e-5 | 2e-5 |
| Epochs | 1-5 | 3 |
| LoRA r | 8-32 | 16 |
| LoRA Ξ± | 8-32 | 16 |
Files Summary
/home/ma/models/
βββ Qwen3-0.6B-DPO/ # Downloaded DPO adapters (8.8 MB)
βββ train_dpo_qwen3.py # Training script
βββ merge_lora.py # Merge LoRA adapters
βββ merge_dpo_adapters.py # Merge downloaded adapters
βββ quantize_dpo_model.py # Quantize to GGUF
βββ sample_preference_data.jsonl # Example dataset
βββ DPO-Training-README.md # Documentation
βββ Qwen3_DPO_Training.ipynb # Colab notebook
Next Steps
- For immediate use: Run
merge_dpo_adapters.pythenquantize_dpo_model.py - For custom training: Use Colab notebook with your data
- For production: Train on GPU with larger dataset (5000+ examples)