sarosavo
/

Master-RM

Text Classification

text-generation

text-embeddings-inference

Model card Files Files and versions

sarosavo commited on Jul 14, 2025

Commit

116c985

·

verified ·

1 Parent(s): c5b438c

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -35,11 +35,13 @@ This repository contains a robust, general-domain generative reward model presen
 Generative reward models (also known as LLMs-as-judges), which use large language models (LLMs) to evaluate answer quality, are increasingly adopted in reinforcement learning with verifiable rewards (RLVR). They are often preferred over rigid rule-based metrics, especially for complex reasoning tasks involving free-form outputs. Despite the seeming simplicity of this comparison task, existing generative reward models exhibit surprising vulnerabilities to superficial manipulations: non-word symbols (e.g., ":" or ".") or reasoning openers like "Thought process:" and "Let's solve this problem step by step." can often lead to false positive rewards.
-This model addresses the widespread weakness across various LLMs, datasets, and prompt formats that poses a serious threat to core algorithmic paradigms relying on generative reward models, such as rejection sampling, preference optimization, and RLVR. To mitigate this issue, this work introduces a simple yet effective data augmentation strategy and trains a new generative reward model with substantially improved robustness, highlighting the urgent need for more reliable LLM-based evaluation methods.
 ## How to use
-Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
 ## **Quick start**

 Generative reward models (also known as LLMs-as-judges), which use large language models (LLMs) to evaluate answer quality, are increasingly adopted in reinforcement learning with verifiable rewards (RLVR). They are often preferred over rigid rule-based metrics, especially for complex reasoning tasks involving free-form outputs. Despite the seeming simplicity of this comparison task, existing generative reward models exhibit surprising vulnerabilities to superficial manipulations: non-word symbols (e.g., ":" or ".") or reasoning openers like "Thought process:" and "Let's solve this problem step by step." can often lead to false positive rewards.
+We find that such weakness is widespread across various LLMs, datasets, and prompt formats, posing a serious threat to core algorithmic paradigms relying on generative reward models, such as rejection sampling, preference optimization, and RLVR.
+To mitigate this issue, we train a robust general-domain generative model by leverating a simple yet effective data augmentation strategy. Our reward model demonstates substantially improved robustness over the most advanced commencial models (e.g., GPT-4o, GPT-o1, Claude-4) and specialized generative verifiers (e.g., Omni-Judge, Generative-Verifier).
 ## How to use
+Inputting the question, its ground-truth reference, and the response to be evaluated, the model will judge its correctness.
 ## **Quick start**