Likhith003
/

dpo-pairrm-lora-adapter

Text Generation

preference-optimization

instruction-tuning

text-generation-inference

Model card Files Files and versions

dpo-pairrm-lora-adapter / README.md

Likhith003's picture

Update README.md

b5cae03 verified 9 months ago

|

history blame contribute delete

1.04 kB


	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	tags:
	- llama
	- dpo
	- preference-optimization
	- PEFT
	- instruction-tuning
	pipeline_tag: text-generation
	---

	# DPO Fine-Tuned Adapter - PairRM Dataset
	...
	# DPO Fine-Tuned Adapter - PairRM Dataset

	## 🧠 Model
	- Base: `meta-llama/Llama-3.2-1B-Instruct`
	- Fine-tuned using TRL's `DPOTrainer` with the PairRM preference dataset (500 pairs)

	## ⚙️ Training Parameters
	\| Parameter \| Value \|
	\|-----------------------\|---------------\|
	\| Learning Rate \| 3e-5 \|
	\| Batch Size \| 4 \|
	\| Epochs \| 3 \|
	\| Beta (DPO regularizer)\| 0.1 \|
	\| Max Input Length \| 1024 tokens \|
	\| Max Prompt Length \| 512 tokens \|
	\| Padding Token \| `eos_token` \|

	## 📦 Dataset
	- Source: `pairrm_preferences.csv`
	- Size: 500 instructions with `prompt`, `chosen`, and `rejected` columns

	## 📂 Output
	- Adapter saved and uploaded as `Likhith003/dpo-pairrm-lora-adapter`