Add metadata and improve model card
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,24 +1,41 @@
|
|
| 1 |
-
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
|
|
|
|
| 5 |
|
| 6 |
-
##
|
| 7 |
-
Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use.
|
| 8 |
-
However, most methods still relies on sparse outcome-based reward for training.
|
| 9 |
-
Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results.
|
| 10 |
-
In this paper, we introduce \textbf{Agent Reasoning Reward Model (Agent-RRM)}, a multi-faceted reward model that produces structured feedback for agentic trajectories, including (1) an explicit reasoning trace , (2) a focused critique that provides refinement guidance by highlighting reasoning flaws, and (3) an overall score that evaluates process performance.
|
| 11 |
-
Leveraging these signals, we systematically investigate three integration strategies: \textbf{Reagent-C} (text-augmented refinement), \textbf{Reagent-R} (reward-augmented guidance), and \textbf{Reagent-U} (unified feedback integration).
|
| 12 |
-
Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7\% on GAIA and 46.2\% on WebWalkerQA, validating the effectiveness of our reasoning reward model and training schemes.
|
| 13 |
|
| 14 |
-
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Citation
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
```
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: reinforcement-learning
|
| 4 |
+
library_name: transformers
|
| 5 |
+
tags:
|
| 6 |
+
- agent
|
| 7 |
+
- reward-model
|
| 8 |
+
- reasoning
|
| 9 |
+
- RL
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Agent Reasoning Reward Model (Agent-RRM)
|
| 13 |
+
|
| 14 |
+
This is the official repository for **Agent-RRM**, introduced in the paper [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154).
|
| 15 |
|
| 16 |
+
- **Paper:** [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154)
|
| 17 |
+
- **Code:** [GitHub - kxfan2002/Reagent](https://github.com/kxfan2002/Reagent)
|
| 18 |
|
| 19 |
+
## Introduction
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
+
Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still rely on sparse outcome-based rewards for training. Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results.
|
| 22 |
+
|
| 23 |
+
In this paper, we introduce **Agent Reasoning Reward Model (Agent-RRM)**, a multi-faceted reward model that produces structured feedback for agentic trajectories, including:
|
| 24 |
+
1. **An explicit reasoning trace**: Step-by-step reasoning analysis.
|
| 25 |
+
2. **A focused critique**: Refinement guidance highlighting reasoning flaws.
|
| 26 |
+
3. **An overall score**: Process performance evaluation.
|
| 27 |
+
|
| 28 |
+
Leveraging these signals, we systematically investigate three integration strategies: **Reagent-C** (text-augmented refinement), **Reagent-R** (reward-augmented guidance), and **Reagent-U** (unified feedback integration). Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7% on GAIA and 46.2% on WebWalkerQA.
|
| 29 |
|
| 30 |
## Citation
|
| 31 |
|
| 32 |
+
If you find this work helpful, please consider citing:
|
| 33 |
|
| 34 |
+
```bibtex
|
| 35 |
+
@article{fan2025exploring,
|
| 36 |
+
title={Exploring Reasoning Reward Model for Agents},
|
| 37 |
+
author={Kaixuan Fan and Kaituo Feng and Manyuan Zhang and Tianshuo Peng and Zhixun Li and Yilei Jiang and Shuang Chen and Peng Pei and Xunliang Cai and Xiangyu Yue},
|
| 38 |
+
journal={arXiv preprint arXiv:2601.22154},
|
| 39 |
+
year={2025}
|
| 40 |
+
}
|
| 41 |
+
```
|