nielsr HF Staff commited on
Commit
47cf1ac
·
verified ·
1 Parent(s): 5adc3f4

Improve model card: Add pipeline tag, paper link, code link, description, and usage

Browse files

This PR significantly enhances the model card by:

* Linking to the paper: [Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning](https://huggingface.co/papers/2511.21581).
* Adding a link to the official GitHub repository: https://github.com/apning/adaptive-latent-reasoning.
* Including a concise description of the model based on its abstract.
* Adding the `pipeline_tag: text-generation` to improve discoverability on the Hugging Face Hub.
* Adding relevant additional tags: `reinforcement-learning`, `latent-reasoning`, and `math`.
* Providing a sample Python code snippet from the GitHub README for quick model loading and usage.
* Adding a citation section.

Please review and merge if these improvements are satisfactory!

Files changed (1) hide show
  1. README.md +44 -4
README.md CHANGED
@@ -1,10 +1,50 @@
1
  ---
2
- library_name: transformers
3
- license: llama3.2
4
  base_model: meta-llama/Llama-3.2-1B-Instruct
5
  datasets:
6
  - whynlp/gsm8k-aug
7
- tags: []
 
 
 
 
 
 
8
  ---
9
 
10
- Built with Llama
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: meta-llama/Llama-3.2-1B-Instruct
3
  datasets:
4
  - whynlp/gsm8k-aug
5
+ library_name: transformers
6
+ license: llama3.2
7
+ tags:
8
+ - reinforcement-learning
9
+ - latent-reasoning
10
+ - math
11
+ pipeline_tag: text-generation
12
  ---
13
 
14
+ # Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning
15
+
16
+ This repository contains model weights for "Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning", presented in the paper available at [https://huggingface.co/papers/2511.21581](https://huggingface.co/papers/2511.21581).
17
+
18
+ Built with Llama 3.2, this model introduces an adaptive-length latent reasoning approach via a post-SFT reinforcement-learning methodology. The goal is to optimize latent reasoning length by minimizing reasoning length while maintaining accuracy, which in turn further reduces compute usage and raises the bar on the compressive capabilities of latent reasoning models. Experiments on the Llama 3.2 1B model and the GSM8K-Aug dataset showed a 52% drop in total reasoning length with no penalty to accuracy.
19
+
20
+ For more detailed information, including training scripts and replication instructions, please refer to the [official GitHub repository](https://github.com/apning/adaptive-latent-reasoning).
21
+
22
+ ## Usage
23
+
24
+ You can load these models using the `automodelforcausallm_from_pretrained_latent` function from `src.model_creation`, as demonstrated in the official GitHub repository:
25
+
26
+ ```python
27
+ from transformers import AutoTokenizer
28
+ from src.model_creation import automodelforcausallm_from_pretrained_latent
29
+
30
+ repo_id = "Lapisbird/Llama-adaLR-model-latent-6" # Example model from the paper's main results table
31
+
32
+ model = automodelforcausallm_from_pretrained_latent(repo_id)
33
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
34
+ ```
35
+
36
+ ## Citation
37
+
38
+ If you use this model or the associated research, please consider citing the paper:
39
+
40
+ ```bibtex
41
+ @article{luo2024learning,
42
+ title={Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning},
43
+ author={Junyu Luo and Xiao Luo and Xiusi Chen and Zhiping Xiao and Wei Ju and Ming Zhang},
44
+ year={2025},
45
+ eprint={2511.21581},
46
+ archivePrefix={arXiv},
47
+ primaryClass={cs.CL},
48
+ url={https://arxiv.org/abs/2511.21581},
49
+ }
50
+ ```