IntelligenceLab
/

RewardPreferenceBert

Model card Files Files and versions

zli12321 commited on Jun 20, 2025

Commit

13577c4

·

verified ·

1 Parent(s): 1da6c31

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -37,6 +37,12 @@ We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-V
 - RewardBert is specifically targeted for free-form GRPO training, where the answers cannot be evaluated based on simple correctness.
 - We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
 #### Method: `compute_score`
 **Parameters**
 - `reference_answer` (list of str): A list of gold (correct) answers to the question

 - RewardBert is specifically targeted for free-form GRPO training, where the answers cannot be evaluated based on simple correctness.
 - We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
+### Installation
+```
+## For more evaluation metrics, refer to https://github.com/zli12321/qa_metrics
+pip install qa-metrics
+```
 #### Method: `compute_score`
 **Parameters**
 - `reference_answer` (list of str): A list of gold (correct) answers to the question