sahithimuppavaram
/

dpo-pairrm-lora-adapter

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Adapter: DPO Fine-Tuned Model (PairRM Dataset)

🧠 Base Model

meta-llama/Llama-3.2-1B-Instruct
Fine-tuned using the DPOTrainer from TRL on the PairRM preference dataset consisting of 500 preference pairs.

⚙️ Training Configuration

Setting	Value
Learning Rate	3e-5
Batch Size	4
Training Epochs	3
DPO Beta	0.1
Max Sequence Length	1024 tokens
Max Prompt Length	512 tokens
Padding Token	EOS Token

📊 Dataset Overview

File: pairrm_preferences.csv
Contains 500 entries with fields: prompt, chosen, and rejected.

📁 Output Location

Adapter uploaded to Hugging Face Hub at: sahithimuppavaram/dpo-pairrm-lora-adapter

Downloads last month: 1

Safetensors

Model size

1B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support