GAIR
/

LIMO-v2

Safetensors

Model card Files Files and versions

xet

Community

Benjamin02 commited on Jul 30, 2025

Commit

b2f0ec6

verified ·

1 Parent(s): 4b22d80

Update README.md

Browse files

Files changed (1) hide show

README.md +124 -3

README.md CHANGED Viewed

@@ -1,3 +1,124 @@
----
-license: apache-2.0
----

+# LIMO: Less Is More for Reasoning 🚀
+This is the **updated version (v2)** of the LIMO model, corresponding to the latest paper version as of July 30, 2025.
+## Model Information
+| Model | Backbone | Size |
+|-------|----------|------|
+| LIMO-v2 | [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 32B |
+## Previous Version
+If you need the original LIMO model (corresponding to the initial paper version), you can access it at:
+- **LIMO v1**: [`GAIR/LIMO`](https://huggingface.co/GAIR/LIMO)
+## Quick Start
+Our model is fine-tuned on [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) and is compatible with most mainstream frameworks like [HF Transformers](https://github.com/huggingface/transformers), [VLLM](https://github.com/vllm-project/vllm), [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and etc.
+### Using HF Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Initialize model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "GAIR/LIMO-v2",
+    torch_dtype="auto",
+    trust_remote_code=True,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True)
+# Prepare input messages
+messages = [
+    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
+    {"role": "user", "content": "What is the result of 1+1?"}
+]
+# Format input using chat template
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+# Tokenize input
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+# Generate response
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=32768,
+    temperature=0.7,
+    top_p=0.95,
+    do_sample=True
+)
+# Decode and print response
+response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
+print(response)
+```
+### Using VLLM
+```python
+from vllm import LLM, SamplingParams
+from transformers import AutoTokenizer
+# Initialize the model
+llm = LLM(
+    model="GAIR/LIMO-v2",
+    tensor_parallel_size=4,  # adjust based on available GPUs
+    trust_remote_code=True,
+    swap_space=60,
+    gpu_memory_utilization=0.96,
+)
+# Prepare input messages
+messages = [
+    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
+    {"role": "user", "content": "What is the result of 1+1?"}
+]
+# Setup tokenizer
+tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True)
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+# Configure generation parameters
+sampling_params = SamplingParams(
+    temperature=0.7,
+    max_tokens=32768,
+    top_p=0.95,
+)
+# Generate response
+output = llm.generate(text, sampling_params)
+print(output[0].outputs[0].text)
+```
+## License
+This project is licensed under the MIT License.
+## Citation
+```bibtex
+@misc{ye2025limoreasoning,
+      title={LIMO: Less is More for Reasoning},
+      author={Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu},
+      year={2025},
+      eprint={2502.03387},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2502.03387},
+}
+```
+For more details and training code, please visit our [GitHub repository](https://github.com/GAIR-NLP/LIMO).