Gemma-2 DPO Fine-tuned Model
- Developed by: Phantomcloak19
- License: Apache-2.0
- Base model:
unsloth/gemma-2-2b-bnb-4bit - Training framework: Unsloth + TRL (DPO)
This Gemma-2 (2B) model has been fine-tuned using Direct Preference Optimization (DPO) to reduce hallucinations and improve factual consistency, trained on the Unified Hallucination Benchmark.
The model was trained ~2× faster using Unsloth, enabling efficient low-VRAM fine-tuning.
