YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Adapter: DPO Fine-Tuned Model (PairRM Dataset)
π§ Base Model
meta-llama/Llama-3.2-1B-Instruct- Fine-tuned using the
DPOTrainerfrom TRL on the PairRM preference dataset consisting of 500 preference pairs.
βοΈ Training Configuration
| Setting | Value |
|---|---|
| Learning Rate | 3e-5 |
| Batch Size | 4 |
| Training Epochs | 3 |
| DPO Beta | 0.1 |
| Max Sequence Length | 1024 tokens |
| Max Prompt Length | 512 tokens |
| Padding Token | EOS Token |
π Dataset Overview
- File:
pairrm_preferences.csv - Contains 500 entries with fields:
prompt,chosen, andrejected.
π Output Location
- Adapter uploaded to Hugging Face Hub at:
sahithimuppavaram/dpo-pairrm-lora-adapter
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support