File size: 6,944 Bytes
0fcce4c
63ad777
 
 
 
0fcce4c
 
 
 
 
63ad777
0fcce4c
 
441c5c5
0a8ebea
 
43992b7
 
 
 
 
 
 
 
 
441c5c5
 
0fcce4c
0a6ff44
 
 
 
 
3020970
 
0fcce4c
3020970
 
 
0fcce4c
 
 
5cae3bc
0fcce4c
 
2c77ff4
0fcce4c
 
 
 
 
 
5cae3bc
0fcce4c
5d648f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1218a24
 
 
 
 
 
 
2fe697d
1218a24
 
 
 
0fcce4c
 
 
 
 
 
 
779e147
 
0fcce4c
 
 
 
 
 
2c77ff4
0fcce4c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9c0b38f
 
 
 
 
 
 
 
0fcce4c
63ad777
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
language:
- en
license: apache-2.0
pipeline_tag: text-generation
tags:
- text-generation
- agent
- tool-use
- long-context
library_name: transformers
---

<div style="display: flex; justify-content: center; align-items: center; gap: 20px;">
  <img src="assets/sii.jpg" alt="SII" width="100px">
  <img src="assets/asi.png" alt="ASI" width="100px">

</div>
<div align="center>  
  

<a href="https://github.com/GAIR-NLP/LIMI" target="_blank" style="margin: 2px;">
    <img alt="Chat" src="assets/teaser.jpg" style="display: inline-block; vertical-align: middle;"/>
</a>

</div>

# LIMI: Less is More for Agency

[![arXiv](https://img.shields.io/badge/arXiv-2509.17567-b31b1b.svg)](https://arxiv.org/pdf/2509.17567)
[![GitHub](https://img.shields.io/badge/GitHub-Repository-green)](https://github.com/GAIR-NLP/LIMI)
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/GAIR/LIMI)

---
To learn more about LIMI, feel free to explore our documentation and resources. Our release consists of the following sections:

- **Model Zoo && Quick Start**: Basic usage and demonstrations with Transformers, vLLM, and SGLang for LIMI and LIMI-Air;
- **Evaluation**: Comprehensive evaluation suite with metrics for agentic capabilities assessment;
- **Prompting**: Usage of LIMI with frameworks for agentic applications, tool use, and reasoning tasks.

## Overview

LIMI is an agentic model fine‑tuned from [GLM‑4.5](https://huggingface.co/zai-org/GLM-4.5) using compact, high‑quality data to emphasize:

- Targeted capabilities: tool use, multi‑turn correction, spec compliance
- Long‑context trajectory with tokenizer‑filtered samples
- OpenAI‑style `messages` with optional function/tool calls

## Model Details

- Base model: `zai-org/GLM-4.5`
- Training framework: slime
- Training data: curated conversations from [GAIR/LIMI](https://huggingface.co/datasets/GAIR/LIMI)

## Performance

### SFT with LIMI Dataset on Dense Models

Our LIMI dataset significantly enhances dense models (Qwen3 series) on both in-domain and out-of-domain benchmarks:

<p align="center">
  <img src="./assets/generalize_improvement.png" style="width: 85%;" alt="Performance Improvements on AgencyBench and Out-of-Domain Benchmarks">
</p>

The figure above demonstrates the effectiveness of our training approach:
- **Left (AgencyBench)**: Substantial improvements on in-domain agentic tasks, with Qwen3-4B (4.6% → 8.6%), Qwen3-8B (7.3% → 10.6%), and Qwen3-32B (8.4% → 20.5%).
- **Right (Out-of-Domain)**: Strong generalization to unseen benchmarks while maintaining performance, with Qwen3-4B (28.3% → 28.9%), Qwen3-8B (31.2% → 32.0%), and Qwen3-32B (35.2% → 37.1%).

### LIMI Models on AgencyBench

Our models achieve state-of-the-art performance across multiple agentic evaluation tasks:

| Model | FTFC (↑) | RC@3 (↑) | SR@3 (↑) | Avg. |
|-------|----------|----------|----------|-----------------|
| GLM-4.5-Air | 15.0 | 16.1 | 20.0 | 17.0 |
| GLM-4.5 | 37.8 | 50.0 | 47.4 | 45.1 |
|GLM-4.5-Code| 48.0 | 48.0|47.5| 47.8|
| **LIMI-Air** | **35.4** | **34.3** | **33.1** | **34.3** |
| **LIMI** | **71.7** | **74.2** | **74.6** | **73.5** |

For detailed benchmark results, experimental setup, and comprehensive comparisons, please refer to our [paper](https://arxiv.org/pdf/2509.17567). 

## Model Zoo

Our LIMO model is available on Hugging Face 🤗:

| Model | Backbone | Size | Link |
|---|---|---|---|
| LIMI | [GLM‑4.5](https://huggingface.co/zai-org/GLM-4.5) | 353B | https://huggingface.co/GAIR/LIMI |
| LIMI‑Air | [GLM‑4.5‑Air](https://huggingface.co/zai-org/GLM-4.5-Air) | 107B | https://huggingface.co/GAIR/LIMI-Air |


## Datasets

We release our datasets through Hugging Face 🤗:
- Name: `GAIR/LIMI`
- Summary: curated agentic SFT data (OpenAI `messages`, optional `tools`, normalized tool‑call arguments); current release contains ~78 high‑quality samples.
- Link: https://huggingface.co/datasets/GAIR/LIMI

## Quick Start

<details>
<summary>Start with HF Transformers</summary>

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "GAIR/LIMI", torch_dtype="auto", device_map="auto", trust_remote_code=True
)
tok = AutoTokenizer.from_pretrained("GAIR/LIMI", trust_remote_code=True)

messages = [
    {"role": "system", "content": "You are a helpful assistant tasked with discovering mathematical function structures for scientific systems."},
    {"role": "user", "content": "Modify the equation.py function, considering the physical meaning and relationships of the inputs."}
]

text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    do_sample=True,
)
print(tok.decode(out[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True))
```

</details>

<details>
<summary>Start with VLLM</summary>

```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

llm = LLM(model="GAIR/LIMI", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("GAIR/LIMI", trust_remote_code=True)
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
out = llm.generate(text, SamplingParams(temperature=0.6, max_tokens=4096, top_p=0.95))
print(out[0].outputs[0].text)
```

</details>

## Prompting

- Messages follow OpenAI chat format; include a grounding system message when helpful.
- Example:

```json
[
  {"role": "system", "content": "You are a helpful assistant tasked with discovering mathematical function structures for scientific systems."},
  {"role": "user", "content": "Modify the equation.py function, considering the physical meaning and relationships of the inputs."}
]
```

## Evaluation

- We report FTFC (First‑Turn Functional Completeness), SR@R (Success Rate at R), and RC@R (Remaining Chances at R) with R=3.
- See the paper for experimental protocol and scores.

## Limitations

- May produce incorrect tool arguments or overfit to frequent schemas
- Not safety‑filtered for sensitive domains; use with guardrails and oversight

## License

- Inherits base model (GLM‑4.5) terms; verify upstream license before deployment

## Citation

```bibtex
@misc{xiao2025limiagency,
      title={LIMI: Less is More for Agency}, 
      author={Yang Xiao and Mohan Jiang and Jie Sun and Keyu Li and Jifan Lin and Yumin Zhuang and Ji Zeng and Shijie Xia and Qishuo Hua and Xuefeng Li and Xiaojie Cai and Tongyu Wang and Yue Zhang and Liming Liu and Xia Wu and Jinlong Hou and Yuan Cheng and Wenjie Li and Xiang Wang and Dequan Wang and Pengfei Liu},
      year={2025},
      eprint={2509.17567},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.17567}, 
}
```