Improve model card: Add pipeline tag, paper link, code link, description, and usage
Browse filesThis PR significantly enhances the model card by:
* Linking to the paper: [Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning](https://huggingface.co/papers/2511.21581).
* Adding a link to the official GitHub repository: https://github.com/apning/adaptive-latent-reasoning.
* Including a concise description of the model based on its abstract.
* Adding the `pipeline_tag: text-generation` to improve discoverability on the Hugging Face Hub.
* Adding relevant additional tags: `reinforcement-learning`, `latent-reasoning`, and `math`.
* Providing a sample Python code snippet from the GitHub README for quick model loading and usage.
* Adding a citation section.
Please review and merge if these improvements are satisfactory!
|
@@ -1,10 +1,50 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: transformers
|
| 3 |
-
license: llama3.2
|
| 4 |
base_model: meta-llama/Llama-3.2-1B-Instruct
|
| 5 |
datasets:
|
| 6 |
- whynlp/gsm8k-aug
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
base_model: meta-llama/Llama-3.2-1B-Instruct
|
| 3 |
datasets:
|
| 4 |
- whynlp/gsm8k-aug
|
| 5 |
+
library_name: transformers
|
| 6 |
+
license: llama3.2
|
| 7 |
+
tags:
|
| 8 |
+
- reinforcement-learning
|
| 9 |
+
- latent-reasoning
|
| 10 |
+
- math
|
| 11 |
+
pipeline_tag: text-generation
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning
|
| 15 |
+
|
| 16 |
+
This repository contains model weights for "Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning", presented in the paper available at [https://huggingface.co/papers/2511.21581](https://huggingface.co/papers/2511.21581).
|
| 17 |
+
|
| 18 |
+
Built with Llama 3.2, this model introduces an adaptive-length latent reasoning approach via a post-SFT reinforcement-learning methodology. The goal is to optimize latent reasoning length by minimizing reasoning length while maintaining accuracy, which in turn further reduces compute usage and raises the bar on the compressive capabilities of latent reasoning models. Experiments on the Llama 3.2 1B model and the GSM8K-Aug dataset showed a 52% drop in total reasoning length with no penalty to accuracy.
|
| 19 |
+
|
| 20 |
+
For more detailed information, including training scripts and replication instructions, please refer to the [official GitHub repository](https://github.com/apning/adaptive-latent-reasoning).
|
| 21 |
+
|
| 22 |
+
## Usage
|
| 23 |
+
|
| 24 |
+
You can load these models using the `automodelforcausallm_from_pretrained_latent` function from `src.model_creation`, as demonstrated in the official GitHub repository:
|
| 25 |
+
|
| 26 |
+
```python
|
| 27 |
+
from transformers import AutoTokenizer
|
| 28 |
+
from src.model_creation import automodelforcausallm_from_pretrained_latent
|
| 29 |
+
|
| 30 |
+
repo_id = "Lapisbird/Llama-adaLR-model-latent-6" # Example model from the paper's main results table
|
| 31 |
+
|
| 32 |
+
model = automodelforcausallm_from_pretrained_latent(repo_id)
|
| 33 |
+
tokenizer = AutoTokenizer.from_pretrained(repo_id)
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Citation
|
| 37 |
+
|
| 38 |
+
If you use this model or the associated research, please consider citing the paper:
|
| 39 |
+
|
| 40 |
+
```bibtex
|
| 41 |
+
@article{luo2024learning,
|
| 42 |
+
title={Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning},
|
| 43 |
+
author={Junyu Luo and Xiao Luo and Xiusi Chen and Zhiping Xiao and Wei Ju and Ming Zhang},
|
| 44 |
+
year={2025},
|
| 45 |
+
eprint={2511.21581},
|
| 46 |
+
archivePrefix={arXiv},
|
| 47 |
+
primaryClass={cs.CL},
|
| 48 |
+
url={https://arxiv.org/abs/2511.21581},
|
| 49 |
+
}
|
| 50 |
+
```
|