Qwen2.5-1.5B-DPO-Truthy

This model is a fine-tuned version of Qwen2.5-1.5B-Instruct using Direct Preference Optimization (DPO). The goal of this alignment was to improve the model's truthfulness and reduce hallucinations by training it on human-preferred factual responses.

Model Description

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Alignment Technique: DPO (Direct Preference Optimization)
  • Training Method: PEFT-LoRA (Low-Rank Adaptation)
  • Precision: 4-bit Quantization (bitsandbytes)

Training Details

The model was trained on the truthy-dpo dataset, which contains pairs of "chosen" (accurate/truthful) and "rejected" (incorrect/hallucinated) responses.

Hyperparameters

  • Learning Rate: 5e-5
  • Batch Size: 1
  • Gradient Accumulation Steps: 4
  • Optimizer: Paged AdamW 32-bit
  • LoRA R: 8
  • LoRA Alpha: 16

Evaluation Results

AlpacaEval-style Benchmark (LLM-as-a-Judge)

We evaluated the model against the base Qwen2.5-1.5B-Instruct model using Gemini-1.5-Flash as an impartial judge across 15 factual test cases.

Metric Result
Model B (DPO) Wins 1
Ties 14
Model A (Base) Wins 0
Final Win Rate 53.33%

Discussion

The DPO alignment was successful, achieving a win rate of 53.33%. While the high number of ties (93%) suggests that the base model already maintains a high standard of instruction-following, the DPO model successfully shifted the preference towards the "truthy" distribution without any losses against the base model.

Complexity Reduction

To ensure efficient training, this project utilized:

  • 4-bit Quantization: Reducing memory footprint for T4 GPU compatibility.
  • LoRA: Reducing trainable parameters from 1.5 Billion to approximately 1.5 Million (~0.1% of the model).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-1.5B-Instruct"
adapter_model = "your-username/Qwen2.5-1.5B-DPO-Truthy"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter_model)
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for st126107/qwen2.5-truthful-dpo

Adapter
(912)
this model

Evaluation results