ComparisonPO
/

Mistral-Base-7B-DPO_clean

Model card Files Files and versions

Mistral-Base-7B-DPO_clean / README.md

PeterLauLukCh's picture

Update README.md

893361c verified 11 months ago

|

history blame contribute delete

227 Bytes

	---
	license: mit
	datasets:
	- trl-lib/ultrafeedback_binarized
	base_model:
	- alignment-handbook/zephyr-7b-sft-full
	---

	DPO model excluding the noisy preference pairs for Mistral-Base under trl/ultradeedback_binarized finetuning.