Update README.md
Browse files
README.md
CHANGED
|
@@ -24,6 +24,10 @@ In addition to the mathematical Outcome Reward Model (ORM) Qwen2.5-Math-RM-72B,
|
|
| 24 |

|
| 25 |
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
## Requirements
|
| 29 |
* `transformers>=4.40.0` for Qwen2.5-Math models. The latest version is recommended.
|
|
@@ -122,10 +126,12 @@ print(step_reward) # [[1.0, 0.1904296875, 0.9765625, 1.0]]
|
|
| 122 |
If you find our work helpful, feel free to give us a citation.
|
| 123 |
|
| 124 |
```
|
| 125 |
-
@article{
|
| 126 |
-
title={
|
| 127 |
-
author={
|
| 128 |
-
|
| 129 |
-
|
|
|
|
|
|
|
| 130 |
}
|
| 131 |
```
|
|
|
|
| 24 |

|
| 25 |
|
| 26 |
|
| 27 |
+
## Model Details
|
| 28 |
+
|
| 29 |
+
For more details, please refer to our [paper](https://arxiv.org/pdf/2501.07301).
|
| 30 |
+
|
| 31 |
|
| 32 |
## Requirements
|
| 33 |
* `transformers>=4.40.0` for Qwen2.5-Math models. The latest version is recommended.
|
|
|
|
| 126 |
If you find our work helpful, feel free to give us a citation.
|
| 127 |
|
| 128 |
```
|
| 129 |
+
@article{prmlessons,
|
| 130 |
+
title={The Lessons of Developing Process Reward Models in Mathematical Reasoning},
|
| 131 |
+
author={
|
| 132 |
+
Zhenru Zhang and Chujie Zheng and Yangzhen Wu and Beichen Zhang and Runji Lin and Bowen Yu and Dayiheng Liu and Jingren Zhou and Junyang Lin
|
| 133 |
+
},
|
| 134 |
+
journal={arXiv preprint arXiv:2501.07301},
|
| 135 |
+
year={2025}
|
| 136 |
}
|
| 137 |
```
|