bunny127
/

Reagent

Safetensors

qwen3

Model card Files Files and versions

xet

Community

Add metadata and improve model card

by nielsr HF Staff - opened Jan 31

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+34

-17

Files changed (1) hide show

README.md +34 -17

README.md CHANGED Viewed

@@ -1,24 +1,41 @@
-# Official Repo of Reagent.
-Paper: https://arxiv.org/abs/2601.22154
-Code: https://github.com/kxfan2002/Reagent
-## Abstract:
-Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use.
-However, most methods still relies on sparse outcome-based reward for training.
-Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results.
-In this paper, we introduce \textbf{Agent Reasoning Reward Model (Agent-RRM)}, a multi-faceted reward model that produces structured feedback for agentic trajectories, including (1) an explicit reasoning trace , (2) a focused critique that provides refinement guidance by highlighting reasoning flaws, and (3) an overall score that evaluates process performance.
-Leveraging these signals, we systematically investigate three integration strategies: \textbf{Reagent-C} (text-augmented refinement), \textbf{Reagent-R} (reward-augmented guidance), and \textbf{Reagent-U} (unified feedback integration).
-Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7\% on GAIA and 46.2\% on WebWalkerQA, validating the effectiveness of our reasoning reward model and training schemes.
-## GitHub Repository
-The official codebase, including training and evaluation scripts for Reagent, can be found on the project's GitHub repository: https://github.com/kxfan2002/Reagent
 ## Citation
-```bash
-```
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: reinforcement-learning
+library_name: transformers
+tags:
+- agent
+- reward-model
+- reasoning
+- RL
+---
+# Agent Reasoning Reward Model (Agent-RRM)
+This is the official repository for **Agent-RRM**, introduced in the paper [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154).
+- **Paper:** [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154)
+- **Code:** [GitHub - kxfan2002/Reagent](https://github.com/kxfan2002/Reagent)
+## Introduction
+Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still rely on sparse outcome-based rewards for training. Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results.
+In this paper, we introduce **Agent Reasoning Reward Model (Agent-RRM)**, a multi-faceted reward model that produces structured feedback for agentic trajectories, including:
+1.  **An explicit reasoning trace**: Step-by-step reasoning analysis.
+2.  **A focused critique**: Refinement guidance highlighting reasoning flaws.
+3.  **An overall score**: Process performance evaluation.
+Leveraging these signals, we systematically investigate three integration strategies: **Reagent-C** (text-augmented refinement), **Reagent-R** (reward-augmented guidance), and **Reagent-U** (unified feedback integration). Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7% on GAIA and 46.2% on WebWalkerQA.
 ## Citation
+If you find this work helpful, please consider citing:
+```bibtex
+@article{fan2025exploring,
+  title={Exploring Reasoning Reward Model for Agents},
+  author={Kaixuan Fan and Kaituo Feng and Manyuan Zhang and Tianshuo Peng and Zhixun Li and Yilei Jiang and Shuang Chen and Peng Pei and Xunliang Cai and Xiangyu Yue},
+  journal={arXiv preprint arXiv:2601.22154},
+  year={2025}
+}
+```