Reward model from the paper BiasGRPO: https://arxiv.org/abs/2606.04807

We encourage you to use this reward model in your multi-objective RLHF pipelines!

We release a custom bias reward model that is highly compute-efficient (only 0.1B parameters) and avoids knowledge degradation, providing a plug-and-play resource that can be seamlessly integrated into complex, multi-objective RLHF pipelines without conflicting with other objectives or adding compute overhead. Thus, this reward model lowers the barriers to entry and enables more researchers to implement robust bias mitigation into their RLHF pipelines without any compute or capability trade-offs.

Downloads last month
24
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SaketR1/bias-reward-model

Finetuned
(2325)
this model

Paper for SaketR1/bias-reward-model