exp020-simpo-merged
SFT + CPO/SimPO merged model. Full 16-bit weights, no adapter loading required.
Training Pipeline
- SFT: tomofusa/exp015-blend-h-lora
- CPO/SimPO: u-10bei/dpo-dataset-qwen-cot (1 epoch, lr=5e-07, beta=2.5)
CPO/SimPO Configuration
- Trainer: CPOTrainer (reference-free)
- Loss type: simpo
- Learning rate: 5e-07
- Beta: 2.5 (SimPO scale, NOT DPO scale)
- SimPO gamma: 1.375
- CPO alpha: 0.0
- LoRA: r=64, alpha=128
- Max length: 1024
- Downloads last month
- 36
Model tree for tomofusa/exp020-simpo-merged
Base model
Qwen/Qwen3-4B-Instruct-2507