YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Adapter: DPO Fine-Tuned Model (PairRM Dataset)

🧠 Base Model

  • meta-llama/Llama-3.2-1B-Instruct
  • Fine-tuned using the DPOTrainer from TRL on the PairRM preference dataset consisting of 500 preference pairs.

βš™οΈ Training Configuration

Setting Value
Learning Rate 3e-5
Batch Size 4
Training Epochs 3
DPO Beta 0.1
Max Sequence Length 1024 tokens
Max Prompt Length 512 tokens
Padding Token EOS Token

πŸ“Š Dataset Overview

  • File: pairrm_preferences.csv
  • Contains 500 entries with fields: prompt, chosen, and rejected.

πŸ“ Output Location

  • Adapter uploaded to Hugging Face Hub at: sahithimuppavaram/dpo-pairrm-lora-adapter
Downloads last month
1
Safetensors
Model size
1B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support