The base model used in CurioSFT (based on Qwen2.5-Math-7B-Base), described in Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models. We change the chat_template, rope_theta, and the context window for long context reasoning.

Citation

If you find our data is useful, please kindly cite our paper:

@misc{curioSFT,
      title={Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models}, 
      author={Hao Wang and Hao Gu and Hongming Piao and Kaixiong Gong and Yuxiao Ye and Xiangyu Yue and Sirui Han and Yike Guo and Dapeng Wu},
      year={2026},
      eprint={2602.02244},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.02244}, 
}

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

F32

Paper for Hao0oWang/Qwen2.5-Math-7B-16k-think

Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models

Paper • 2602.02244 • Published 9 days ago