Update README.md
Browse files
README.md
CHANGED
|
@@ -18,8 +18,7 @@ library_name: transformers
|
|
| 18 |
|
| 19 |
# 🚀 Can we cast reward modeling as a reasoning task?
|
| 20 |
|
| 21 |
-
**RM-R1** is a training framework for *Reasoning Reward Model* (ReasRM) that judges two candidate answers by first **thinking out loud**—generating structured rubrics or reasoning traces—then emitting its preference.
|
| 22 |
-
Compared to traditional scalar or generative reward models, RM-R1 delivers **state-of-the-art performance** on public RM benchmarks while offering fully interpretable justifications.
|
| 23 |
|
| 24 |
|
| 25 |
## 🧠 TL;DR
|
|
@@ -37,7 +36,8 @@ Compared to traditional scalar or generative reward models, RM-R1 delivers **sta
|
|
| 37 |
## 🔍 Demo Code
|
| 38 |
|
| 39 |
Try the model with this example. Full demo notebook available at:
|
| 40 |
-
|
|
|
|
| 41 |
|
| 42 |
### 🧾 Prompt Template
|
| 43 |
|
|
@@ -149,4 +149,15 @@ completion = tokenizer.decode(
|
|
| 149 |
)
|
| 150 |
|
| 151 |
print(completion)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
```
|
|
|
|
| 18 |
|
| 19 |
# 🚀 Can we cast reward modeling as a reasoning task?
|
| 20 |
|
| 21 |
+
**RM-R1** is a training framework for *Reasoning Reward Model* (ReasRM) that judges two candidate answers by first **thinking out loud**—generating structured rubrics or reasoning traces—then emitting its preference. Compared to traditional scalar or generative reward models, RM-R1 delivers **state-of-the-art performance** on public RM benchmarks while offering fully interpretable justifications.
|
|
|
|
| 22 |
|
| 23 |
|
| 24 |
## 🧠 TL;DR
|
|
|
|
| 36 |
## 🔍 Demo Code
|
| 37 |
|
| 38 |
Try the model with this example. Full demo notebook available at:
|
| 39 |
+
|
| 40 |
+
📎 [Official Demo Link](https://github.com/RM-R1-UIUC/RM-R1/blob/main/demo/demo.ipynb)
|
| 41 |
|
| 42 |
### 🧾 Prompt Template
|
| 43 |
|
|
|
|
| 149 |
)
|
| 150 |
|
| 151 |
print(completion)
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
## Citations
|
| 155 |
+
|
| 156 |
+
```bibtex
|
| 157 |
+
@article{chen2025rm,
|
| 158 |
+
title={RM-R1: Reward Modeling as Reasoning},
|
| 159 |
+
author={Chen, Xiusi and Li, Gaotang and Wang, Ziqi and Jin, Bowen and Qian, Cheng and Wang, Yu and Wang, Hongru and Zhang, Yu and Zhang, Denghui and Zhang, Tong and others},
|
| 160 |
+
journal={arXiv preprint arXiv:2505.02387},
|
| 161 |
+
year={2025}
|
| 162 |
+
}
|
| 163 |
```
|