| license: mit | |
| datasets: | |
| - trl-lib/ultrafeedback_binarized | |
| base_model: | |
| - alignment-handbook/zephyr-7b-sft-full | |
| DPO model excluding the noisy preference pairs for Mistral-Base under trl/ultradeedback_binarized finetuning. |
| license: mit | |
| datasets: | |
| - trl-lib/ultrafeedback_binarized | |
| base_model: | |
| - alignment-handbook/zephyr-7b-sft-full | |
| DPO model excluding the noisy preference pairs for Mistral-Base under trl/ultradeedback_binarized finetuning. |