| base_model: | |
| - openai-community/gpt2 | |
| datasets: | |
| - openai/gsm8k | |
| library_name: transformers | |
| pipeline_tag: feature-extraction | |
| tags: | |
| - rm | |
| - latent | |
| license: apache-2.0 | |
| # LatentRM | |
| The Latent Reward Model (LatentRM) is a learned scorer designed for latent reasoning models that reason in continuous hidden space. | |
| LatentRM provides the missing aggregation signal for parallel test-time scaling in latent models, enabling techniques such as best-of-N and beam search without explicit token-level probabilities. | |
| <p align="center"> | |
| <a href="https://arxiv.org/pdf/2510.07745"><b>Paper Link</b>๐๏ธ</a> | |
| </p> | |
| <p align="center"> | |
| <a href="https://github.com/YRYangang/LatentTTS"><b>GitHub Repo</b>๐</a> | |
| </p> | |
| ## Citation | |
| ``` | |
| @misc{you2025paralleltesttimescalinglatent, | |
| title={Parallel Test-Time Scaling for Latent Reasoning Models}, | |
| author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li}, | |
| year={2025}, | |
| eprint={2510.07745}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2510.07745}, | |
| } | |
| ``` |