4ecd072 dd27b34 893361c
1
2
3
4
5
6
7
8
9
--- license: mit datasets: - trl-lib/ultrafeedback_binarized base_model: - alignment-handbook/zephyr-7b-sft-full --- DPO model excluding the noisy preference pairs for Mistral-Base under trl/ultradeedback_binarized finetuning.