TITLE

This model is a fine-tuned version of BASE_MODEL using Direct Preference Optimization (DPO) via the Unsloth library.

This repository contains the full-merged 16-bit weights. No adapter loading is required.

Training Configuration

  • Base model: BASE_MODEL
  • Method: DPO (Direct Preference Optimization)
  • Epochs: EPOCHS
  • Learning rate: LR
  • Beta: BETA
  • Max sequence length: MAXLEN

Sources & License

  • Training Data: DATASET
  • Compliance: Users must follow the original base model's license terms.
Downloads last month
1
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support