bowiehsu
/

D-CORE-8B

 ---
 license: apache-2.0
+pipeline_tag: text-generation
 ---
+# D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use
+This repository contains the weights for **D-CORE** (**D**ecomposing tasks and **Co**mposing **Re**asoning processes), a two-stage training framework designed to enhance the task decomposition and reflective reasoning capabilities of Large Reasoning Models (LRMs) for complex tool use.
+## Introduction
+Effective tool use and reasoning are essential capabilities for large reasoning models (LRMs) to address complex real-world problems. Through empirical analysis, the authors identified that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to "Lazy Reasoning."
+To address this, D-CORE proposes a two-stage training framework:
+1.  **Self-distillation**: Incentivizes the LRM's task decomposition reasoning capability.
+2.  **Diversity-aware Reinforcement Learning (RL)**: Restores the LRM's reflective reasoning capability.
+D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Notably, D-CORE-14B establishes a new state-of-the-art on BFCLv3, outperforming 70B models despite being 5$\times$ smaller.
+## Resources
+- **Paper**: [D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use](https://huggingface.co/papers/2602.02160)
+- **Arxiv**: [2602.02160](https://arxiv.org/abs/2602.02160)
+- **Code**: [EfficientAI (GitHub)](https://github.com/alibaba/EfficientAI)
+## Authors
+Bowen Xu, Shaoyu Wu, Hao Jiang, Kai Liu, Xin Chen, Lulu Hu, Bin Yang
+## Citation
+If you find our work useful, please cite:
+```bibtex
+@article{xu2026dcore,
+  title={D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use},
+  author={Xu, Bowen and Wu, Shaoyu and Jiang, Hao and Liu, Kai and Chen, Xin and Hu, Lulu and Yang, Bin},
+  journal={arXiv preprint arXiv:2602.02160},
+  year={2026}
+}
+```