Hao0oWang's picture
Update README.md
f31a903 verified
metadata
license: mit
library_name: transformers
pipeline_tag: text-generation

The base model used in CurioSFT (based on Qwen2.5-Math-7B-Base), described in Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models. We change the chat_template, rope_theta, and the context window for long context reasoning.

Citation

If you find our data is useful, please kindly cite our paper:

@misc{curioSFT,
      title={Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models}, 
      author={Hao Wang and Hao Gu and Hongming Piao and Kaixiong Gong and Yuxiao Ye and Xiangyu Yue and Sirui Han and Yike Guo and Dapeng Wu},
      year={2026},
      eprint={2602.02244},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.02244}, 
}