Lab 22: DPO Alignment - Vietnamese LLM

This repository contains a DPO-aligned version of Qwen2.5-3B, fine-tuned as part of the VinUni AICB program.

Model Details

  • Base Model: unsloth/Qwen2.5-3B-bnb-4bit
  • SFT Dataset: bkai-foundation-models/vi-alpaca
  • Preference Dataset: argilla/ultrafeedback-binarized-preferences-cleaned
  • DPO Hyperparameters: Beta=0.1, LR=5e-07

Quantization

Includes GGUF versions (Q4_K_M and Q8_0) for efficient inference with llama.cpp.

Downloads last month
65
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Huanvg02/lab22-dpo-vn

Base model

Qwen/Qwen2.5-3B
Quantized
(3)
this model