Improve model card metadata and content
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,4 +1,36 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
---
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
---
|
| 5 |
+
|
| 6 |
+
# D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use
|
| 7 |
+
|
| 8 |
+
This repository contains the weights for **D-CORE** (**D**ecomposing tasks and **Co**mposing **Re**asoning processes), a two-stage training framework designed to enhance the task decomposition and reflective reasoning capabilities of Large Reasoning Models (LRMs) for complex tool use.
|
| 9 |
+
|
| 10 |
+
## Introduction
|
| 11 |
+
Effective tool use and reasoning are essential capabilities for large reasoning models (LRMs) to address complex real-world problems. Through empirical analysis, the authors identified that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to "Lazy Reasoning."
|
| 12 |
+
|
| 13 |
+
To address this, D-CORE proposes a two-stage training framework:
|
| 14 |
+
1. **Self-distillation**: Incentivizes the LRM's task decomposition reasoning capability.
|
| 15 |
+
2. **Diversity-aware Reinforcement Learning (RL)**: Restores the LRM's reflective reasoning capability.
|
| 16 |
+
|
| 17 |
+
D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Notably, D-CORE-14B establishes a new state-of-the-art on BFCLv3, outperforming 70B models despite being 5$\times$ smaller.
|
| 18 |
+
|
| 19 |
+
## Resources
|
| 20 |
+
- **Paper**: [D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use](https://huggingface.co/papers/2602.02160)
|
| 21 |
+
- **Arxiv**: [2602.02160](https://arxiv.org/abs/2602.02160)
|
| 22 |
+
- **Code**: [EfficientAI (GitHub)](https://github.com/alibaba/EfficientAI)
|
| 23 |
+
|
| 24 |
+
## Authors
|
| 25 |
+
Bowen Xu, Shaoyu Wu, Hao Jiang, Kai Liu, Xin Chen, Lulu Hu, Bin Yang
|
| 26 |
+
|
| 27 |
+
## Citation
|
| 28 |
+
If you find our work useful, please cite:
|
| 29 |
+
```bibtex
|
| 30 |
+
@article{xu2026dcore,
|
| 31 |
+
title={D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use},
|
| 32 |
+
author={Xu, Bowen and Wu, Shaoyu and Jiang, Hao and Liu, Kai and Chen, Xin and Hu, Lulu and Yang, Bin},
|
| 33 |
+
journal={arXiv preprint arXiv:2602.02160},
|
| 34 |
+
year={2026}
|
| 35 |
+
}
|
| 36 |
+
```
|