Qwen3-0.6B-quantized / DPO-Training /DPO-Complete-Guide.md
Bopalv's picture
Upload DPO-Training/DPO-Complete-Guide.md with huggingface_hub
9f91593 verified

Qwen3-0.6B DPO Training - Complete Setup

What's Ready

1. Downloaded DPO Model (LoRA Adapters)

  • Location: /home/ma/models/Qwen3-0.6B-DPO/
  • Source: AIPlans/Qwen3-0.6B-DPO
  • Size: 8.8 MB (LoRA adapters only)
  • Status: βœ… Downloaded

2. Training Scripts Created

  • train_dpo_qwen3.py - Main DPO training script
  • merge_lora.py - Merge LoRA with base model
  • merge_dpo_adapters.py - Merge downloaded DPO adapters
  • quantize_dpo_model.py - Quantize to GGUF
  • sample_preference_data.jsonl - Example dataset
  • DPO-Training-README.md - Documentation

3. Colab Notebook Created

  • Qwen3_DPO_Training.ipynb - Ready for Google Colab
  • Free T4 GPU available
  • Complete training pipeline

Quick Start Options

Option A: Use Existing DPO Model (Fastest)

# Merge LoRA adapters with base model
python merge_dpo_adapters.py

# Quantize to GGUF
python quantize_dpo_model.py --model_path ./Qwen3-0.6B-DPO-merged --quantization Q4_K_S

Option B: Train Your Own DPO (On GPU)

  1. Upload to Google Colab:

    • Upload Qwen3_DPO_Training.ipynb
    • Upload train_dpo_qwen3.py, merge_lora.py, sample_preference_data.jsonl
    • Set Runtime β†’ GPU (T4)
  2. Run training:

    !python train_dpo_qwen3.py --beta 0.1 --epochs 3
    
  3. Download trained model from Colab

Option C: Train with Custom Data

  1. Create your preference dataset:

    {"prompt": "Question?", "chosen": "Good answer", "rejected": "Bad answer"}
    
  2. Train:

    python train_dpo_qwen3.py --dataset your_data.jsonl --beta 0.1
    

DPO Parameters Guide

Parameter Range Recommendation
Beta (Ξ²) 0.05-0.2 Start with 0.1
Learning Rate 1e-5 to 5e-5 2e-5
Epochs 1-5 3
LoRA r 8-32 16
LoRA Ξ± 8-32 16

Files Summary

/home/ma/models/
β”œβ”€β”€ Qwen3-0.6B-DPO/              # Downloaded DPO adapters (8.8 MB)
β”œβ”€β”€ train_dpo_qwen3.py           # Training script
β”œβ”€β”€ merge_lora.py                # Merge LoRA adapters
β”œβ”€β”€ merge_dpo_adapters.py        # Merge downloaded adapters
β”œβ”€β”€ quantize_dpo_model.py        # Quantize to GGUF
β”œβ”€β”€ sample_preference_data.jsonl # Example dataset
β”œβ”€β”€ DPO-Training-README.md       # Documentation
└── Qwen3_DPO_Training.ipynb     # Colab notebook

Next Steps

  1. For immediate use: Run merge_dpo_adapters.py then quantize_dpo_model.py
  2. For custom training: Use Colab notebook with your data
  3. For production: Train on GPU with larger dataset (5000+ examples)

References