BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization
Paper • 2606.04807 • Published
How to use SaketR1/bias-reward-model with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="SaketR1/bias-reward-model") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("SaketR1/bias-reward-model")
model = AutoModelForSequenceClassification.from_pretrained("SaketR1/bias-reward-model")Reward model from the paper BiasGRPO: https://arxiv.org/abs/2606.04807
We encourage you to use this reward model in your multi-objective RLHF pipelines!
We release a custom bias reward model that is highly compute-efficient (only 0.1B parameters) and avoids knowledge degradation, providing a plug-and-play resource that can be seamlessly integrated into complex, multi-objective RLHF pipelines without conflicting with other objectives or adding compute overhead. Thus, this reward model lowers the barriers to entry and enables more researchers to implement robust bias mitigation into their RLHF pipelines without any compute or capability trade-offs.
Base model
FacebookAI/roberta-base