RM-NLHF
Collection
Official collection for paper "Reward Modeling from Natural Language Human Feedback". • 8 items • Updated
• 1
This is the official model repository for the paper "Reward Modeling from Natural Language Human Feedback".
We release multiple model variants. All checkpoints are available in our collection:
📦 Collection: Tongyi-ConvAI/rm-nlhf
Ready-to-use generative reward models trained with the full RM-NLHF pipeline.
| Model | Size | Link |
|---|---|---|
| RM-NLHF-Qwen | 7B | 🤗 Tongyi-ConvAI/RM-NLHF-Qwen-7B |
| RM-NLHF-Qwen | 32B | 🤗 Tongyi-ConvAI/RM-NLHF-Qwen-32B |
The cold-start MetaRM described in the paper, used as initial weights for MetaRM.
| Model | Size | Link |
|---|---|---|
| Cold-Start-MetaRM | 7B | 🤗 Tongyi-ConvAI/Cold-Start-MetaRM-RM-NLHF-Qwen-7B |
| Cold-Start-MetaRM | 32B | 🤗 Tongyi-ConvAI/Cold-Start-MetaRM-RM-NLHF-Qwen-32B |
Final-step MetaRM checkpoints co-trained alongside the generative reward model.
| Model | Size | Link |
|---|---|---|
| Final-MetaRM | 7B | 🤗 Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-7B |
| Final-MetaRM | 32B | 🤗 Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-32B |
A baseline trained solely on outcome labels, without natural language critique.
| Model | Size | Link |
|---|---|---|
| Baseline-Outcome-Reward | 7B | 🤗 Tongyi-ConvAI/Baseline-Outcome-Reward-Qwen-7B |
@misc{wang2026rewardmodelingnaturallanguage,
title={Reward Modeling from Natural Language Human Feedback},
author={Zongqi Wang and Rui Wang and Yuchuan Wu and Yiyao Yu and Pinyi Zhang and Shaoning Sun and Yujiu Yang and Yongbin Li},
year={2026},
eprint={2601.07349},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.07349},
}