The base model used in CurioSFT (based on Qwen2.5-Math-7B-Base), described in Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models. We change the chat_template, rope_theta, and the context window for long context reasoning.

Citation

If you find our data is useful, please kindly cite our paper:

@misc{curioSFT,
      title={Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models}, 
      author={Hao Wang and Hao Gu and Hongming Piao and Kaixiong Gong and Yuxiao Ye and Xiangyu Yue and Sirui Han and Yike Guo and Dapeng Wu},
      year={2026},
      eprint={2602.02244},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.02244}, 
}
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Hao0oWang/Qwen2.5-Math-7B-16k-think