--- license: apache-2.0 language: - en library_name: transformers tags: - llama - dpo - preference-optimization - PEFT - instruction-tuning pipeline_tag: text-generation --- # DPO Fine-Tuned Adapter - PairRM Dataset ... # DPO Fine-Tuned Adapter - PairRM Dataset ## 🧠 Model - Base: `meta-llama/Llama-3.2-1B-Instruct` - Fine-tuned using TRL's `DPOTrainer` with the PairRM preference dataset (500 pairs) ## ⚙️ Training Parameters | Parameter | Value | |-----------------------|---------------| | Learning Rate | 3e-5 | | Batch Size | 4 | | Epochs | 3 | | Beta (DPO regularizer)| 0.1 | | Max Input Length | 1024 tokens | | Max Prompt Length | 512 tokens | | Padding Token | `eos_token` | ## 📦 Dataset - Source: `pairrm_preferences.csv` - Size: 500 instructions with `prompt`, `chosen`, and `rejected` columns ## 📂 Output - Adapter saved and uploaded as `Likhith003/dpo-pairrm-lora-adapter`