Update README.md
Browse files
README.md
CHANGED
|
@@ -37,7 +37,8 @@ Compared to traditional scalar or generative reward models, RM-R1 delivers **sta
|
|
| 37 |
## 🔍 Demo Code
|
| 38 |
|
| 39 |
Try the model with this example. Full demo notebook available at:
|
| 40 |
-
|
|
|
|
| 41 |
|
| 42 |
### 🧾 Prompt Template
|
| 43 |
|
|
@@ -154,10 +155,10 @@ print(completion)
|
|
| 154 |
## Citations
|
| 155 |
|
| 156 |
```bibtex
|
| 157 |
-
@
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
}
|
| 163 |
```
|
|
|
|
| 37 |
## 🔍 Demo Code
|
| 38 |
|
| 39 |
Try the model with this example. Full demo notebook available at:
|
| 40 |
+
|
| 41 |
+
📎 [Official Demo Link](https://github.com/RM-R1-UIUC/RM-R1/blob/main/demo/demo.ipynb)
|
| 42 |
|
| 43 |
### 🧾 Prompt Template
|
| 44 |
|
|
|
|
| 155 |
## Citations
|
| 156 |
|
| 157 |
```bibtex
|
| 158 |
+
@article{chen2025rm,
|
| 159 |
+
title={RM-R1: Reward Modeling as Reasoning},
|
| 160 |
+
author={Chen, Xiusi and Li, Gaotang and Wang, Ziqi and Jin, Bowen and Qian, Cheng and Wang, Yu and Wang, Hongru and Zhang, Yu and Zhang, Denghui and Zhang, Tong and others},
|
| 161 |
+
journal={arXiv preprint arXiv:2505.02387},
|
| 162 |
+
year={2025}
|
| 163 |
}
|
| 164 |
```
|