Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models
Paper
•
2602.02244
•
Published
The base model used in CurioSFT (based on Qwen2.5-Math-7B-Base), described in Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models. We change the chat_template, rope_theta, and the context window for long context reasoning.
If you find our data is useful, please kindly cite our paper:
@misc{curioSFT,
title={Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models},
author={Hao Wang and Hao Gu and Hongming Piao and Kaixiong Gong and Yuxiao Ye and Xiangyu Yue and Sirui Han and Yike Guo and Dapeng Wu},
year={2026},
eprint={2602.02244},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.02244},
}