base_model: meta-llama/Llama-3.2-1B-Instruct
datasets:
- whynlp/gsm8k-aug
library_name: transformers
license: llama3.2
pipeline_tag: text-generation
tags: []
Llama-adaLR-model-latent-6: Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning
This repository contains the Llama-adaLR-model-latent-6 model, presented in the paper "Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning".
This model introduces adaptive-length latent reasoning, a novel approach to optimizing the reasoning length of Transformer language models. By leveraging a post-SFT reinforcement-learning methodology, it aims to minimize reasoning length while maintaining accuracy. Experiments on the Llama 3.2 1B model and the GSM8K-Aug dataset demonstrated a 52% drop in total reasoning length without sacrificing accuracy.
For more details, including additional model weights and ongoing developments, please refer to the official GitHub repository.
Sample Usage
You can load these models using the function automodelforcausallm_from_pretrained_latent from src.model_creation with the transformers library, as shown in the following example found in the GitHub repository:
from transformers import AutoTokenizer
from src.model_creation import automodelforcausallm_from_pretrained_latent
repo_id = "Lapisbird/Llama-adaLR-model-latent-6"
model = automodelforcausallm_from_pretrained_latent(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)