exp020-simpo-merged

SFT + CPO/SimPO merged model. Full 16-bit weights, no adapter loading required.

Training Pipeline

  1. SFT: tomofusa/exp015-blend-h-lora
  2. CPO/SimPO: u-10bei/dpo-dataset-qwen-cot (1 epoch, lr=5e-07, beta=2.5)

CPO/SimPO Configuration

  • Trainer: CPOTrainer (reference-free)
  • Loss type: simpo
  • Learning rate: 5e-07
  • Beta: 2.5 (SimPO scale, NOT DPO scale)
  • SimPO gamma: 1.375
  • CPO alpha: 0.0
  • LoRA: r=64, alpha=128
  • Max length: 1024
Downloads last month
36
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomofusa/exp020-simpo-merged

Finetuned
(1404)
this model

Dataset used to train tomofusa/exp020-simpo-merged