Trained on 1600 samples of LM-SYS conversational data having undergone the REFLECT process.
Official finetuned model using REFLECT generated for analyzing trend in KL Divergence against Winrate as training progresses.
Trained on 1600 samples of LM-SYS conversational data having undergone the REFLECT process.
Official finetuned model using REFLECT generated for analyzing trend in KL Divergence against Winrate as training progresses.