|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-32B-Instruct |
|
|
--- |
|
|
# LIMO: Less Is More for Reasoning 🚀 |
|
|
|
|
|
This is the **updated version (v2)** of the LIMO model, corresponding to the latest paper version as of July 30, 2025. |
|
|
|
|
|
## Model Information |
|
|
|
|
|
| Model | Backbone | Size | |
|
|
|-------|----------|------| |
|
|
| LIMO-v2 | [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 32B | |
|
|
|
|
|
## Previous Version |
|
|
|
|
|
If you need the original LIMO model (corresponding to the initial paper version), you can access it at: |
|
|
- **LIMO v1**: [`GAIR/LIMO`](https://huggingface.co/GAIR/LIMO) |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
Our model is fine-tuned on [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) and is compatible with most mainstream frameworks like [HF Transformers](https://github.com/huggingface/transformers), [VLLM](https://github.com/vllm-project/vllm), [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and etc. |
|
|
|
|
|
### Using HF Transformers |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Initialize model and tokenizer |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"GAIR/LIMO-v2", |
|
|
torch_dtype="auto", |
|
|
trust_remote_code=True, |
|
|
device_map="auto" |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True) |
|
|
|
|
|
# Prepare input messages |
|
|
messages = [ |
|
|
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."}, |
|
|
{"role": "user", "content": "What is the result of 1+1?"} |
|
|
] |
|
|
|
|
|
# Format input using chat template |
|
|
text = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True |
|
|
) |
|
|
|
|
|
# Tokenize input |
|
|
inputs = tokenizer(text, return_tensors="pt").to(model.device) |
|
|
|
|
|
# Generate response |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=32768, |
|
|
temperature=0.7, |
|
|
top_p=0.95, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
# Decode and print response |
|
|
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Using VLLM |
|
|
|
|
|
```python |
|
|
from vllm import LLM, SamplingParams |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
# Initialize the model |
|
|
llm = LLM( |
|
|
model="GAIR/LIMO-v2", |
|
|
tensor_parallel_size=4, # adjust based on available GPUs |
|
|
trust_remote_code=True, |
|
|
swap_space=60, |
|
|
gpu_memory_utilization=0.96, |
|
|
) |
|
|
|
|
|
# Prepare input messages |
|
|
messages = [ |
|
|
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."}, |
|
|
{"role": "user", "content": "What is the result of 1+1?"} |
|
|
] |
|
|
|
|
|
# Setup tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True) |
|
|
text = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True |
|
|
) |
|
|
|
|
|
# Configure generation parameters |
|
|
sampling_params = SamplingParams( |
|
|
temperature=0.7, |
|
|
max_tokens=32768, |
|
|
top_p=0.95, |
|
|
) |
|
|
|
|
|
# Generate response |
|
|
output = llm.generate(text, sampling_params) |
|
|
print(output[0].outputs[0].text) |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{ye2025limoreasoning, |
|
|
title={LIMO: Less is More for Reasoning}, |
|
|
author={Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu}, |
|
|
year={2025}, |
|
|
eprint={2502.03387}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2502.03387}, |
|
|
} |
|
|
``` |
|
|
|
|
|
For more details and training code, please visit our [GitHub repository](https://github.com/GAIR-NLP/LIMO). |