Reward model from the paper BiasGRPO: https://arxiv.org/abs/2606.04807

We encourage you to "heart" this reward model & use it in your multi-objective RLHF pipelines!

We release a custom bias reward model that is highly compute-efficient (only 0.1B parameters) and avoids knowledge degradation, providing a plug-and-play resource that can be seamlessly integrated into complex, multi-objective RLHF pipelines without conflicting with other objectives or adding compute overhead. Thus, this reward model lowers the barriers to entry and enables more researchers to implement robust bias mitigation into their RLHF pipelines without any compute or capability trade-offs.

Downloads last month: 22

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for SaketR1/bias-reward-model

Base model

FacebookAI/roberta-base

Finetuned

(2370)

this model

Paper for SaketR1/bias-reward-model

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

Paper • 2606.04807 • Published Jun 3