Llama-adaLR-appendix-model-codi / README.md

nielsr HF Staff

Improve model card with description, links, and sample usage

c897dbf verified about 2 months ago

preview code

raw

history blame

1.6 kB

metadata

base_model: meta-llama/Llama-3.2-1B-Instruct
datasets:
  - whynlp/gsm8k-aug
library_name: transformers
license: llama3.2
pipeline_tag: text-generation
tags: []

Llama-adaLR-model-latent-6: Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning

This repository contains the Llama-adaLR-model-latent-6 model, presented in the paper "Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning".

This model introduces adaptive-length latent reasoning, a novel approach to optimizing the reasoning length of Transformer language models. By leveraging a post-SFT reinforcement-learning methodology, it aims to minimize reasoning length while maintaining accuracy. Experiments on the Llama 3.2 1B model and the GSM8K-Aug dataset demonstrated a 52% drop in total reasoning length without sacrificing accuracy.

For more details, including additional model weights and ongoing developments, please refer to the official GitHub repository.

Sample Usage

You can load these models using the function automodelforcausallm_from_pretrained_latent from src.model_creation with the transformers library, as shown in the following example found in the GitHub repository:

from transformers import AutoTokenizer
from src.model_creation import automodelforcausallm_from_pretrained_latent

repo_id = "Lapisbird/Llama-adaLR-model-latent-6"

model = automodelforcausallm_from_pretrained_latent(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)