DPO LoRA Adapter for Instruction Alignment

This repository provides a LoRA adapter fine-tuned with Direct Preference Optimization (DPO) for improving instruction-following behavior and safety alignment.

Notes

This repository contains only LoRA adapter weights.
The tokenizer should be loaded from the base model.

Downloads last month: -; Downloads are not tracked for this model. How to track