TITLE
This model is a fine-tuned version of BASE_MODEL using Direct Preference Optimization (DPO) via the Unsloth library.
This repository contains the full-merged 16-bit weights. No adapter loading is required.
Training Configuration
- Base model: BASE_MODEL
- Method: DPO (Direct Preference Optimization)
- Epochs: EPOCHS
- Learning rate: LR
- Beta: BETA
- Max sequence length: MAXLEN
Sources & License
- Training Data: DATASET
- Compliance: Users must follow the original base model's license terms.
- Downloads last month
- 1