Update README.md
Browse files
README.md
CHANGED
|
@@ -1,38 +1,38 @@
|
|
| 1 |
-
---
|
| 2 |
-
library_name: transformers
|
| 3 |
-
tags:
|
| 4 |
-
- rm
|
| 5 |
-
- latent
|
| 6 |
-
datasets:
|
| 7 |
-
- openai/gsm8k
|
| 8 |
-
base_model:
|
| 9 |
-
- openai-community/gpt2
|
| 10 |
-
pipeline_tag: token-classification
|
| 11 |
-
---
|
| 12 |
-
|
| 13 |
-
#
|
| 14 |
-
|
| 15 |
-
The Latent Reward Model (LatentRM) is a learned scorer designed for latent reasoning models that reason in continuous hidden space.
|
| 16 |
-
LatentRM provides the missing aggregation signal for parallel test-time scaling in latent models, enabling techniques such as best-of-N and beam search without explicit token-level probabilities.
|
| 17 |
-
|
| 18 |
-
<p align="center">
|
| 19 |
-
<a href="https://arxiv.org/pdf/2510.07745"><b>Paper Link</b>👁️</a>
|
| 20 |
-
</p>
|
| 21 |
-
|
| 22 |
-
<p align="center">
|
| 23 |
-
<a href="https://github.com/
|
| 24 |
-
</p>
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
## Citation
|
| 28 |
-
```
|
| 29 |
-
@misc{you2025paralleltesttimescalinglatent,
|
| 30 |
-
title={Parallel Test-Time Scaling for Latent Reasoning Models},
|
| 31 |
-
author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
|
| 32 |
-
year={2025},
|
| 33 |
-
eprint={2510.07745},
|
| 34 |
-
archivePrefix={arXiv},
|
| 35 |
-
primaryClass={cs.CL},
|
| 36 |
-
url={https://arxiv.org/abs/2510.07745},
|
| 37 |
-
}
|
| 38 |
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
tags:
|
| 4 |
+
- rm
|
| 5 |
+
- latent
|
| 6 |
+
datasets:
|
| 7 |
+
- openai/gsm8k
|
| 8 |
+
base_model:
|
| 9 |
+
- openai-community/gpt2
|
| 10 |
+
pipeline_tag: token-classification
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# latent-tts-rm
|
| 14 |
+
|
| 15 |
+
The Latent Reward Model (LatentRM) is a learned scorer designed for latent reasoning models that reason in continuous hidden space.
|
| 16 |
+
LatentRM provides the missing aggregation signal for parallel test-time scaling in latent models, enabling techniques such as best-of-N and beam search without explicit token-level probabilities.
|
| 17 |
+
|
| 18 |
+
<p align="center">
|
| 19 |
+
<a href="https://arxiv.org/pdf/2510.07745"><b>Paper Link</b>👁️</a>
|
| 20 |
+
</p>
|
| 21 |
+
|
| 22 |
+
<p align="center">
|
| 23 |
+
<a href="https://github.com/ModalityDance/LatentTTS"><b>GitHub Repo</b>🐙</a>
|
| 24 |
+
</p>
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
## Citation
|
| 28 |
+
```
|
| 29 |
+
@misc{you2025paralleltesttimescalinglatent,
|
| 30 |
+
title={Parallel Test-Time Scaling for Latent Reasoning Models},
|
| 31 |
+
author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
|
| 32 |
+
year={2025},
|
| 33 |
+
eprint={2510.07745},
|
| 34 |
+
archivePrefix={arXiv},
|
| 35 |
+
primaryClass={cs.CL},
|
| 36 |
+
url={https://arxiv.org/abs/2510.07745},
|
| 37 |
+
}
|
| 38 |
```
|