Bopalv commited on
Commit
9f91593
Β·
verified Β·
1 Parent(s): e7728e1

Upload DPO-Training/DPO-Complete-Guide.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. DPO-Training/DPO-Complete-Guide.md +96 -0
DPO-Training/DPO-Complete-Guide.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qwen3-0.6B DPO Training - Complete Setup
2
+
3
+ ## What's Ready
4
+
5
+ ### 1. Downloaded DPO Model (LoRA Adapters)
6
+ - **Location**: `/home/ma/models/Qwen3-0.6B-DPO/`
7
+ - **Source**: [AIPlans/Qwen3-0.6B-DPO](https://huggingface.co/AIPlans/Qwen3-0.6B-DPO)
8
+ - **Size**: 8.8 MB (LoRA adapters only)
9
+ - **Status**: βœ… Downloaded
10
+
11
+ ### 2. Training Scripts Created
12
+ - **train_dpo_qwen3.py** - Main DPO training script
13
+ - **merge_lora.py** - Merge LoRA with base model
14
+ - **merge_dpo_adapters.py** - Merge downloaded DPO adapters
15
+ - **quantize_dpo_model.py** - Quantize to GGUF
16
+ - **sample_preference_data.jsonl** - Example dataset
17
+ - **DPO-Training-README.md** - Documentation
18
+
19
+ ### 3. Colab Notebook Created
20
+ - **Qwen3_DPO_Training.ipynb** - Ready for Google Colab
21
+ - Free T4 GPU available
22
+ - Complete training pipeline
23
+
24
+ ## Quick Start Options
25
+
26
+ ### Option A: Use Existing DPO Model (Fastest)
27
+
28
+ ```bash
29
+ # Merge LoRA adapters with base model
30
+ python merge_dpo_adapters.py
31
+
32
+ # Quantize to GGUF
33
+ python quantize_dpo_model.py --model_path ./Qwen3-0.6B-DPO-merged --quantization Q4_K_S
34
+ ```
35
+
36
+ ### Option B: Train Your Own DPO (On GPU)
37
+
38
+ 1. **Upload to Google Colab**:
39
+ - Upload `Qwen3_DPO_Training.ipynb`
40
+ - Upload `train_dpo_qwen3.py`, `merge_lora.py`, `sample_preference_data.jsonl`
41
+ - Set Runtime β†’ GPU (T4)
42
+
43
+ 2. **Run training**:
44
+ ```python
45
+ !python train_dpo_qwen3.py --beta 0.1 --epochs 3
46
+ ```
47
+
48
+ 3. **Download trained model** from Colab
49
+
50
+ ### Option C: Train with Custom Data
51
+
52
+ 1. Create your preference dataset:
53
+ ```json
54
+ {"prompt": "Question?", "chosen": "Good answer", "rejected": "Bad answer"}
55
+ ```
56
+
57
+ 2. Train:
58
+ ```bash
59
+ python train_dpo_qwen3.py --dataset your_data.jsonl --beta 0.1
60
+ ```
61
+
62
+ ## DPO Parameters Guide
63
+
64
+ | Parameter | Range | Recommendation |
65
+ |-----------|-------|----------------|
66
+ | **Beta (Ξ²)** | 0.05-0.2 | Start with 0.1 |
67
+ | **Learning Rate** | 1e-5 to 5e-5 | 2e-5 |
68
+ | **Epochs** | 1-5 | 3 |
69
+ | **LoRA r** | 8-32 | 16 |
70
+ | **LoRA Ξ±** | 8-32 | 16 |
71
+
72
+ ## Files Summary
73
+
74
+ ```
75
+ /home/ma/models/
76
+ β”œβ”€β”€ Qwen3-0.6B-DPO/ # Downloaded DPO adapters (8.8 MB)
77
+ β”œβ”€β”€ train_dpo_qwen3.py # Training script
78
+ β”œβ”€β”€ merge_lora.py # Merge LoRA adapters
79
+ β”œβ”€β”€ merge_dpo_adapters.py # Merge downloaded adapters
80
+ β”œβ”€β”€ quantize_dpo_model.py # Quantize to GGUF
81
+ β”œβ”€β”€ sample_preference_data.jsonl # Example dataset
82
+ β”œβ”€β”€ DPO-Training-README.md # Documentation
83
+ └── Qwen3_DPO_Training.ipynb # Colab notebook
84
+ ```
85
+
86
+ ## Next Steps
87
+
88
+ 1. **For immediate use**: Run `merge_dpo_adapters.py` then `quantize_dpo_model.py`
89
+ 2. **For custom training**: Use Colab notebook with your data
90
+ 3. **For production**: Train on GPU with larger dataset (5000+ examples)
91
+
92
+ ## References
93
+
94
+ - [DPO Paper](https://arxiv.org/abs/2305.18290)
95
+ - [AIPlans DPO Model](https://huggingface.co/AIPlans/Qwen3-0.6B-DPO)
96
+ - [TRL Documentation](https://huggingface.co/docs/trl)