bunny127
/

Reagent

Model card Files Files and versions

Reagent / README.md

nielsr's picture

nielsr HF Staff

Add metadata and improve model card

271849e verified about 1 month ago

|

2.02 kB

	---
	license: apache-2.0
	pipeline_tag: reinforcement-learning
	library_name: transformers
	tags:
	- agent
	- reward-model
	- reasoning
	- RL
	---

	# Agent Reasoning Reward Model (Agent-RRM)

	This is the official repository for Agent-RRM, introduced in the paper [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154).

	- Paper: [Exploring Reasoning Reward Model for Agents](https://arxiv.org/abs/2601.22154)
	- Code: [GitHub - kxfan2002/Reagent](https://github.com/kxfan2002/Reagent)

	## Introduction

	Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still rely on sparse outcome-based rewards for training. Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results.

	In this paper, we introduce Agent Reasoning Reward Model (Agent-RRM), a multi-faceted reward model that produces structured feedback for agentic trajectories, including:
	1. An explicit reasoning trace: Step-by-step reasoning analysis.
	2. A focused critique: Refinement guidance highlighting reasoning flaws.
	3. An overall score: Process performance evaluation.

	Leveraging these signals, we systematically investigate three integration strategies: Reagent-C (text-augmented refinement), Reagent-R (reward-augmented guidance), and Reagent-U (unified feedback integration). Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7% on GAIA and 46.2% on WebWalkerQA.

	## Citation

	If you find this work helpful, please consider citing:

	```bibtex
	@article{fan2025exploring,
	title={Exploring Reasoning Reward Model for Agents},
	author={Kaixuan Fan and Kaituo Feng and Manyuan Zhang and Tianshuo Peng and Zhixun Li and Yilei Jiang and Shuang Chen and Peng Pei and Xunliang Cai and Xiangyu Yue},
	journal={arXiv preprint arXiv:2601.22154},
	year={2025}
	}
	```