Chosen DPO model, trained on 40k examples for 3 epochs with default parameters (see train_dpo.py). 9c36ff1 verified pleaky2410 commited on May 30, 2024
First model, trained on 40k examples for 3 epochs bbf22f8 verified pleaky2410 commited on May 30, 2024