Add metadata and improve model card
Browse filesHi! I'm Niels from the Hugging Face community science team.
This PR improves your model card by adding metadata to the YAML header (pipeline tag, library name, and license). It also links the model to its associated paper and GitHub repository, making it easier for users to find resources and understand how to use the model.
README.md
CHANGED
|
@@ -1,22 +1,33 @@
|
|
| 1 |
-
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
#
|
| 5 |
-
Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use.
|
| 6 |
-
However, most methods still relies on sparse outcome-based reward for training.
|
| 7 |
-
Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results.
|
| 8 |
-
In this paper, we introduce \textbf{Agent Reasoning Reward Model (Agent-RRM)}, a multi-faceted reward model that produces structured feedback for agentic trajectories, including (1) an explicit reasoning trace , (2) a focused critique that provides refinement guidance by highlighting reasoning flaws, and (3) an overall score that evaluates process performance.
|
| 9 |
-
Leveraging these signals, we systematically investigate three integration strategies: \textbf{Reagent-C} (text-augmented refinement), \textbf{Reagent-R} (reward-augmented guidance), and \textbf{Reagent-U} (unified feedback integration).
|
| 10 |
-
Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7\% on GAIA and 46.2\% on WebWalkerQA, validating the effectiveness of our reasoning reward model and training schemes.
|
| 11 |
|
| 12 |
-
|
| 13 |
-
The official codebase, including training and evaluation scripts for Reagent, can be found on the project's GitHub repository: https://github.com/kxfan2002/Reagent
|
| 14 |
|
| 15 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
| 20 |
-
-
|
| 21 |
-
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: reinforcement-learning
|
| 5 |
+
---
|
| 6 |
|
| 7 |
+
# Agent Reasoning Reward Model (Agent-RRM)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
This repository contains the weights for **Agent-RRM**, introduced in the paper [Exploring Reasoning Reward Model for Agents](https://huggingface.co/papers/2601.22154).
|
|
|
|
| 10 |
|
| 11 |
+
## Introduction
|
| 12 |
+
Agent Reasoning Reward Model (Agent-RRM) is a multi-faceted reward model designed for Agentic Reinforcement Learning. Unlike traditional sparse outcome-based rewards, Agent-RRM provides structured feedback for agentic trajectories, including:
|
| 13 |
+
1. **Explicit reasoning trace**: Step-by-step reasoning analysis.
|
| 14 |
+
2. **Focused critique**: Refinement guidance highlighting reasoning flaws.
|
| 15 |
+
3. **Overall score**: Evaluation of process performance.
|
| 16 |
|
| 17 |
+
These signals enable training strategies like **Reagent-U**, which has demonstrated significant performance leaps on benchmarks such as GAIA and WebWalkerQA.
|
| 18 |
|
| 19 |
+
## Resources
|
| 20 |
+
- **Paper:** [Exploring Reasoning Reward Model for Agents](https://huggingface.co/papers/2601.22154)
|
| 21 |
+
- **GitHub Repository:** [kxfan2002/Reagent](https://github.com/kxfan2002/Reagent)
|
| 22 |
+
|
| 23 |
+
## Citation
|
| 24 |
+
If you find this work helpful, please consider citing:
|
| 25 |
+
|
| 26 |
+
```bibtex
|
| 27 |
+
@article{fan2025exploring,
|
| 28 |
+
title={Exploring Reasoning Reward Model for Agents},
|
| 29 |
+
author={Kaixuan Fan and Kaituo Feng and Manyuan Zhang and Tianshuo Peng and Zhixun Li and Yilei Jiang and Shuang Chen and Peng Pei and Xunliang Cai and Xiangyu Yue},
|
| 30 |
+
journal={arXiv preprint arXiv:2601.22154},
|
| 31 |
+
year={2025}
|
| 32 |
+
}
|
| 33 |
+
```
|