Reagent / README.md
nielsr's picture
nielsr HF Staff
Add metadata and improve model card
271849e verified
|
raw
history blame
2.02 kB
---
license: apache-2.0
pipeline_tag: reinforcement-learning
library_name: transformers
tags:
- agent
- reward-model
- reasoning
- RL
---
# Agent Reasoning Reward Model (Agent-RRM)
This is the official repository for **Agent-RRM**, introduced in the paper [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154).
- **Paper:** [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154)
- **Code:** [GitHub - kxfan2002/Reagent](https://github.com/kxfan2002/Reagent)
## Introduction
Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still rely on sparse outcome-based rewards for training. Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results.
In this paper, we introduce **Agent Reasoning Reward Model (Agent-RRM)**, a multi-faceted reward model that produces structured feedback for agentic trajectories, including:
1. **An explicit reasoning trace**: Step-by-step reasoning analysis.
2. **A focused critique**: Refinement guidance highlighting reasoning flaws.
3. **An overall score**: Process performance evaluation.
Leveraging these signals, we systematically investigate three integration strategies: **Reagent-C** (text-augmented refinement), **Reagent-R** (reward-augmented guidance), and **Reagent-U** (unified feedback integration). Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7% on GAIA and 46.2% on WebWalkerQA.
## Citation
If you find this work helpful, please consider citing:
```bibtex
@article{fan2025exploring,
title={Exploring Reasoning Reward Model for Agents},
author={Kaixuan Fan and Kaituo Feng and Manyuan Zhang and Tianshuo Peng and Zhixun Li and Yilei Jiang and Shuang Chen and Peng Pei and Xunliang Cai and Xiangyu Yue},
journal={arXiv preprint arXiv:2601.22154},
year={2025}
}
```