Trained on 1600 samples of LM-SYS conversational data having undergone the [REFLECT](https://arxiv.org/pdf/2601.18730) process. 

Official finetuned model using REFLECT generated for analyzing trend in KL Divergence against Winrate as training progresses.