DPO LoRA Adapter for Instruction Alignment
This repository provides a LoRA adapter fine-tuned with Direct Preference Optimization (DPO) for improving instruction-following behavior and safety alignment.
Notes
- This repository contains only LoRA adapter weights.
- The tokenizer should be loaded from the base model.