GAIR
/

LIMO-v2

Model card Files Files and versions

LIMO-v2 / README.md

Benjamin02's picture

Update README.md

9510d41 verified 6 months ago

|

history blame contribute delete

3.46 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen2.5-32B-Instruct
	---
	# LIMO: Less Is More for Reasoning 🚀

	This is the updated version (v2) of the LIMO model, corresponding to the latest paper version as of July 30, 2025.

	## Model Information

	\| Model \| Backbone \| Size \|
	\|-------\|----------\|------\|
	\| LIMO-v2 \| [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) \| 32B \|

	## Previous Version

	If you need the original LIMO model (corresponding to the initial paper version), you can access it at:
	- LIMO v1: [`GAIR/LIMO`](https://huggingface.co/GAIR/LIMO)

	## Quick Start

	Our model is fine-tuned on [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) and is compatible with most mainstream frameworks like [HF Transformers](https://github.com/huggingface/transformers), [VLLM](https://github.com/vllm-project/vllm), [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and etc.

	### Using HF Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Initialize model and tokenizer
	model = AutoModelForCausalLM.from_pretrained(
	"GAIR/LIMO-v2",
	torch_dtype="auto",
	trust_remote_code=True,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True)

	# Prepare input messages
	messages = [
	{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
	{"role": "user", "content": "What is the result of 1+1?"}
	]

	# Format input using chat template
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	# Tokenize input
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	# Generate response
	outputs = model.generate(
	**inputs,
	max_new_tokens=32768,
	temperature=0.7,
	top_p=0.95,
	do_sample=True
	)

	# Decode and print response
	response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
	print(response)
	```

	### Using VLLM

	```python
	from vllm import LLM, SamplingParams
	from transformers import AutoTokenizer

	# Initialize the model
	llm = LLM(
	model="GAIR/LIMO-v2",
	tensor_parallel_size=4, # adjust based on available GPUs
	trust_remote_code=True,
	swap_space=60,
	gpu_memory_utilization=0.96,
	)

	# Prepare input messages
	messages = [
	{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
	{"role": "user", "content": "What is the result of 1+1?"}
	]

	# Setup tokenizer
	tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True)
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	# Configure generation parameters
	sampling_params = SamplingParams(
	temperature=0.7,
	max_tokens=32768,
	top_p=0.95,
	)

	# Generate response
	output = llm.generate(text, sampling_params)
	print(output[0].outputs[0].text)
	```



	## Citation

	```bibtex
	@misc{ye2025limoreasoning,
	title={LIMO: Less is More for Reasoning},
	author={Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu},
	year={2025},
	eprint={2502.03387},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.03387},
	}
	```

	For more details and training code, please visit our [GitHub repository](https://github.com/GAIR-NLP/LIMO).