GeneralChat-Llama3.2-3B-DPO
A DPO fine-tuned version of theprint/GeneralChat-Llama3.2-3B.
Description
GeneralChat-Llama3.2-3B, a general-purpose conversational fine-tune of Llama 3.2 3B.
This model was trained with Direct Preference Optimization (DPO) on the theprint/Tom-4.2k-alpaca dataset. Rejected responses were generated using a weak local model to create preference pairs, with chosen responses drawn from the original dataset.
Quick Start
from transformers import pipeline
generator = pipeline("text-generation", model="theprint/GeneralChat-Llama3.2-3B-DPO", device="cuda")
output = generator(
[{"role": "user", "content": "Your prompt here"}],
max_new_tokens=256,
return_full_text=False
)[0]
print(output["generated_text"])
Training Details
| Parameter | Value |
|---|---|
| Method | DPO |
| Base model | theprint/GeneralChat-Llama3.2-3B |
| Dataset | theprint/Tom-4.2k-alpaca |
| Beta | 0.125 |
| LoRA r / alpha | 16 / 32 |
| Learning rate | 1e-5 |
| Epochs | 2 |
| Run name | llama3.2-3b-datom-dpo-0310 |
Framework Versions
- TRL: 0.29.0
- Transformers: 5.3.0
- PyTorch: 2.12.0.dev20260310+cu128
- Datasets: 4.5.0
- PEFT: 0.15.2
- Downloads last month
- 69
