Gemma-2 DPO Fine-tuned Model

  • Developed by: Phantomcloak19
  • License: Apache-2.0
  • Base model: unsloth/gemma-2-2b-bnb-4bit
  • Training framework: Unsloth + TRL (DPO)

This Gemma-2 (2B) model has been fine-tuned using Direct Preference Optimization (DPO) to reduce hallucinations and improve factual consistency, trained on the Unified Hallucination Benchmark.

The model was trained ~2× faster using Unsloth, enabling efficient low-VRAM fine-tuning.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Phantomcloak19/gemma-dpo-full

Finetuned
(251)
this model

Dataset used to train Phantomcloak19/gemma-dpo-full