We release TACLer-1.5B (🤗 HF Model), a hybrid reasoning model that supports both Thinking and NoThinking mode! We propose a model-tailored curriculum reinforcement learning framework that gradually increases the complexity of the data based on the model's proficiency in multi-stage RL training.
Our experiments show that: (i) TACLer reduces computational cost, cutting training compute by over 50% compared to long thinking models and reducing inference token usage by over 42% relative to the base model (DeepSeek-R1-Distill-Qwen-1.5B (R1-Qwen)); and (ii) TACLer improves accuracy by over 9% on the base model, consistently outperforming state-of-the-art Nothinking and Thinking baselines across four math datasets (MATH500, AMC, AIME 2024, and AIME 2025).
Code: https://github.com/laihuiyuan/tacler
Paper: https://arxiv.org/pdf/2601.21711
Citation
@article{lai-etal-2026-tacler,
title = "TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning",
author = "Lai, Huiyuan and Nissim, Malvina",
journal={arXiv preprint arXiv:2601.21711},
year={2026},
url={https://arxiv.org/pdf/2601.21711}
}
- Downloads last month
- 2
Model tree for laihuiyuan/TACLer
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B