Qwen2.5-1.5B-DPO-Truthy
This model is a fine-tuned version of Qwen2.5-1.5B-Instruct using Direct Preference Optimization (DPO). The goal of this alignment was to improve the model's truthfulness and reduce hallucinations by training it on human-preferred factual responses.
Model Description
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Alignment Technique: DPO (Direct Preference Optimization)
- Training Method: PEFT-LoRA (Low-Rank Adaptation)
- Precision: 4-bit Quantization (bitsandbytes)
Training Details
The model was trained on the truthy-dpo dataset, which contains pairs of "chosen" (accurate/truthful) and "rejected" (incorrect/hallucinated) responses.
Hyperparameters
- Learning Rate: 5e-5
- Batch Size: 1
- Gradient Accumulation Steps: 4
- Optimizer: Paged AdamW 32-bit
- LoRA R: 8
- LoRA Alpha: 16
Evaluation Results
AlpacaEval-style Benchmark (LLM-as-a-Judge)
We evaluated the model against the base Qwen2.5-1.5B-Instruct model using Gemini-1.5-Flash as an impartial judge across 15 factual test cases.
| Metric | Result |
|---|---|
| Model B (DPO) Wins | 1 |
| Ties | 14 |
| Model A (Base) Wins | 0 |
| Final Win Rate | 53.33% |
Discussion
The DPO alignment was successful, achieving a win rate of 53.33%. While the high number of ties (93%) suggests that the base model already maintains a high standard of instruction-following, the DPO model successfully shifted the preference towards the "truthy" distribution without any losses against the base model.
Complexity Reduction
To ensure efficient training, this project utilized:
- 4-bit Quantization: Reducing memory footprint for T4 GPU compatibility.
- LoRA: Reducing trainable parameters from 1.5 Billion to approximately 1.5 Million (~0.1% of the model).
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "Qwen/Qwen2.5-1.5B-Instruct"
adapter_model = "your-username/Qwen2.5-1.5B-DPO-Truthy"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter_model)
Model tree for st126107/qwen2.5-truthful-dpo
Evaluation results
- Win Rate on truthy-dpoself-reported53.330