| | --- |
| | license: apache-2.0 |
| | pipeline_tag: reinforcement-learning |
| | library_name: transformers |
| | tags: |
| | - agent |
| | - reward-model |
| | - reasoning |
| | - RL |
| | --- |
| | |
| | # Agent Reasoning Reward Model (Agent-RRM) |
| |
|
| | This is the official repository for **Agent-RRM**, introduced in the paper [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154). |
| |
|
| | - **Paper:** [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154) |
| | - **Code:** [GitHub - kxfan2002/Reagent](https://github.com/kxfan2002/Reagent) |
| |
|
| | ## Introduction |
| |
|
| | Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still rely on sparse outcome-based rewards for training. Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results. |
| |
|
| | In this paper, we introduce **Agent Reasoning Reward Model (Agent-RRM)**, a multi-faceted reward model that produces structured feedback for agentic trajectories, including: |
| | 1. **An explicit reasoning trace**: Step-by-step reasoning analysis. |
| | 2. **A focused critique**: Refinement guidance highlighting reasoning flaws. |
| | 3. **An overall score**: Process performance evaluation. |
| |
|
| | Leveraging these signals, we systematically investigate three integration strategies: **Reagent-C** (text-augmented refinement), **Reagent-R** (reward-augmented guidance), and **Reagent-U** (unified feedback integration). Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7% on GAIA and 46.2% on WebWalkerQA. |
| |
|
| | ## Citation |
| |
|
| | If you find this work helpful, please consider citing: |
| |
|
| | ```bibtex |
| | @article{fan2025exploring, |
| | title={Exploring Reasoning Reward Model for Agents}, |
| | author={Kaixuan Fan and Kaituo Feng and Manyuan Zhang and Tianshuo Peng and Zhixun Li and Yilei Jiang and Shuang Chen and Peng Pei and Xunliang Cai and Xiangyu Yue}, |
| | journal={arXiv preprint arXiv:2601.22154}, |
| | year={2025} |
| | } |
| | ``` |