Reagent / README.md
nielsr's picture
nielsr HF Staff
Add metadata and improve model card
271849e verified
|
raw
history blame
2.02 kB
metadata
license: apache-2.0
pipeline_tag: reinforcement-learning
library_name: transformers
tags:
  - agent
  - reward-model
  - reasoning
  - RL

Agent Reasoning Reward Model (Agent-RRM)

This is the official repository for Agent-RRM, introduced in the paper Exploring Reasoning Reward Model for Agents.

Introduction

Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still rely on sparse outcome-based rewards for training. Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results.

In this paper, we introduce Agent Reasoning Reward Model (Agent-RRM), a multi-faceted reward model that produces structured feedback for agentic trajectories, including:

  1. An explicit reasoning trace: Step-by-step reasoning analysis.
  2. A focused critique: Refinement guidance highlighting reasoning flaws.
  3. An overall score: Process performance evaluation.

Leveraging these signals, we systematically investigate three integration strategies: Reagent-C (text-augmented refinement), Reagent-R (reward-augmented guidance), and Reagent-U (unified feedback integration). Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7% on GAIA and 46.2% on WebWalkerQA.

Citation

If you find this work helpful, please consider citing:

@article{fan2025exploring,
  title={Exploring Reasoning Reward Model for Agents},
  author={Kaixuan Fan and Kaituo Feng and Manyuan Zhang and Tianshuo Peng and Zhixun Li and Yilei Jiang and Shuang Chen and Peng Pei and Xunliang Cai and Xiangyu Yue},
  journal={arXiv preprint arXiv:2601.22154},
  year={2025}
}