💡 Reward Modeling from Natural Language Human Feedback

This is the official model repository for the paper "Reward Modeling from Natural Language Human Feedback".

🔑 Key Features

🥇 First generative reward modeling approach that leverages natural language human critique as training signal.
🤝 Hybrid training strategy — jointly utilizes human-written critiques and a specially trained MetaRM for samples without human annotations, achieving SOTA generative reward modeling performance.

💾 Checkpoints

We release multiple model variants. All checkpoints are available in our collection:

📦 Collection: Tongyi-ConvAI/rm-nlhf

🟢 Final GRM — Generative Reward Models

Ready-to-use generative reward models trained with the full RM-NLHF pipeline.

Model	Size	Link
RM-NLHF-Qwen	7B	🤗 Tongyi-ConvAI/RM-NLHF-Qwen-7B
RM-NLHF-Qwen	32B	🤗 Tongyi-ConvAI/RM-NLHF-Qwen-32B

🔵 Cold-Start MetaRM

The cold-start MetaRM described in the paper, used as initial weights for MetaRM.

Model	Size	Link
Cold-Start-MetaRM	7B	🤗 Tongyi-ConvAI/Cold-Start-MetaRM-RM-NLHF-Qwen-7B
Cold-Start-MetaRM	32B	🤗 Tongyi-ConvAI/Cold-Start-MetaRM-RM-NLHF-Qwen-32B

🟡 Final MetaRM

Final-step MetaRM checkpoints co-trained alongside the generative reward model.

Model	Size	Link
Final-MetaRM	7B	🤗 Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-7B
Final-MetaRM	32B	🤗 Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-32B

⚪ Baseline — Outcome Reward Model

A baseline trained solely on outcome labels, without natural language critique.

Model	Size	Link
Baseline-Outcome-Reward	7B	🤗 Tongyi-ConvAI/Baseline-Outcome-Reward-Qwen-7B

🧷 Citation

@misc{wang2026rewardmodelingnaturallanguage,
  title={Reward Modeling from Natural Language Human Feedback}, 
  author={Zongqi Wang and Rui Wang and Yuchuan Wu and Yiyao Yu and Pinyi Zhang and Shaoning Sun and Yujiu Yang and Yongbin Li},
  year={2026},
  eprint={2601.07349},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2601.07349}, 
}