|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- rm |
|
|
- latent |
|
|
datasets: |
|
|
- openai/gsm8k |
|
|
base_model: |
|
|
- openai-community/gpt2 |
|
|
pipeline_tag: token-classification |
|
|
--- |
|
|
|
|
|
# LatentRM |
|
|
|
|
|
The Latent Reward Model (LatentRM) is a learned scorer designed for latent reasoning models that reason in continuous hidden space. |
|
|
LatentRM provides the missing aggregation signal for parallel test-time scaling in latent models, enabling techniques such as best-of-N and beam search without explicit token-level probabilities. |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://arxiv.org/pdf/2510.07745"><b>Paper Link</b>๐๏ธ</a> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://github.com/YRYangang/LatentTTS"><b>GitHub Repo</b>๐</a> |
|
|
</p> |
|
|
|
|
|
|
|
|
## Citation |
|
|
``` |
|
|
@misc{you2025paralleltesttimescalinglatent, |
|
|
title={Parallel Test-Time Scaling for Latent Reasoning Models}, |
|
|
author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li}, |
|
|
year={2025}, |
|
|
eprint={2510.07745}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2510.07745}, |
|
|
} |
|
|
``` |