Improve model card: Add license, pipeline tag, library name, paper, project, and code links

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +236 -3
README.md CHANGED
@@ -1,3 +1,236 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ pipeline_tag: image-to-image
4
+ library_name: transformers
5
+ ---
6
+
7
+ <p align="center">
8
+ <img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/lego_pic.png" alt="Lego-Edit" width="240"/>
9
+ </p>
10
+
11
+ <p align="center">
12
+ <a href="https://xiaomi-research.github.io/lego-edit/">
13
+ <img
14
+ src="https://img.shields.io/static/v1?label=Project%20Page&message=Web&color=green"
15
+ alt="Lego-Edit Website"
16
+ />
17
+ </a>
18
+ <a href="https://arxiv.org/abs/2509.12883">
19
+ <img
20
+ src="https://img.shields.io/static/v1?label=Tech%20Report&message=Arxiv&color=red"
21
+ alt="Lego-Edit Paper on arXiv"
22
+ />
23
+ </a>
24
+ <a href="https://huggingface.co/xiaomi-research/lego-edit">
25
+ <img
26
+ src="https://img.shields.io/static/v1?label=Model&message=HuggingFace&color=yellow"
27
+ alt="Lego-Edit Model"
28
+ />
29
+ </a>
30
+ <a href="https://editdemo.ai.xiaomi.net/">
31
+ <img
32
+ src="https://img.shields.io/badge/Demo-Live-orange"
33
+ alt="Lego-Edit Demo"
34
+ />
35
+ </a>
36
+ </p>
37
+
38
+ # Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder
39
+
40
+ This repository contains the model weights for **Lego-Edit**, a general image editing framework presented in the paper [Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder](https://huggingface.co/papers/2509.12883).
41
+
42
+ **Project Page**: [https://xiaomi-research.github.io/lego-edit/](https://xiaomi-research.github.io/lego-edit/)
43
+ **GitHub Repository**: [https://github.com/xiaomi-research/lego-edit](https://github.com/xiaomi-research/lego-edit)
44
+ **Live Demo**: [https://editdemo.ai.xiaomi.net/](https://editdemo.ai.xiaomi.net/)
45
+
46
+ ## Abstract
47
+ Instruction-based image editing has garnered significant attention due to its direct interaction with users. However, real-world user instructions are immensely diverse, and existing methods often fail to generalize effectively to instructions outside their training domain, limiting their practical application. To address this, we propose Lego-Edit, which leverages the generalization capability of Multi-modal Large Language Model (MLLM) to organize a suite of model-level editing tools to tackle this challenge. Lego-Edit incorporates two key designs: (1) a model-level toolkit comprising diverse models efficiently trained on limited data and several image manipulation functions, enabling fine-grained composition of editing actions by the MLLM; and (2) a three-stage progressive reinforcement learning approach that uses feedback on unannotated, open-domain instructions to train the MLLM, equipping it with generalized reasoning capabilities for handling real-world instructions. Experiments demonstrate that Lego-Edit achieves state-of-the-art performance on GEdit-Bench and ImgBench. It exhibits robust reasoning capabilities for open-domain instructions and can utilize newly introduced editing tools without additional fine-tuning. The figure below showcases Lego-Edit's qualitative performance.
48
+
49
+ <p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/case_pic.png" width="95%"></p>
50
+
51
+ ## ✨ Features
52
+
53
+ Lego-Edit supports local editing, global editing, and multi-step editing as demonstrated in our tests, with corresponding results shown above. We discuss its feedback responsiveness and tool-extension capabilities in our paper.
54
+
55
+ Additionally, Lego-Edit accepts mask inputs for precise editing region control. Example applications are provided here:
56
+
57
+ <p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/maskcase1.png" width="95%"></p>
58
+
59
+ <p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/maskcase2.png" width="95%"></p>
60
+
61
+ You can try it and find more usages of this framework.
62
+
63
+ ## πŸ“’ News
64
+
65
+ - **Sep 17, 2025:** We released the [demo](https://editdemo.ai.xiaomi.net/), [model](https://huggingface.co/xiaomi-research/lego-edit), and [report](https://arxiv.org/abs/2509.12883) for Lego-Edit.
66
+
67
+ ## πŸ”₯ Quick Start
68
+
69
+ 1️⃣ Set up environment
70
+ ```bash
71
+ conda create -n legoedit python==3.11
72
+ conda activate legoedit
73
+ pip install -r ./requirements.txt
74
+ Install flash-attention (you can install the corresponding version of flash-attention at https://github.com/Dao-AILab/flash-attention/releases)
75
+ Modify ~/yourconda/envs/lego-edit/lib/python3.11/site-packages/transformers/modeling_utils.py, line 5105, map_location="meta" to map_location="cpu"
76
+ ```
77
+
78
+ 2️⃣ Download pretrained checkpoint and custom nodes
79
+
80
+ Custom Nodes:
81
+ ```bash
82
+ cd custom_nodes
83
+ git clone https://github.com/chflame163/ComfyUI_LayerStyle.git
84
+ git clone https://github.com/Fannovel16/comfyui_controlnet_aux.git
85
+ ```
86
+
87
+ Base Model:
88
+
89
+ 1. Download the [FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev/blob/main/flux1-fill-dev.safetensors) and [FLUX.1-Canny-dev](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev/blob/main/flux1-canny-dev.safetensors) and copy them to './models/unet/'.
90
+
91
+ 2. Download the [vae](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev/blob/main/ae.safetensors) and copy it to './models/vae/'.
92
+
93
+ 3. Download the [clip_l](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/clip_l.safetensors) and [t5xxl](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp8_e4m3fn.safetensors) and copy them to './models/clip/'.
94
+
95
+ 4. Download the [lama](https://drive.google.com/file/d/11RbsVSav3O-fReBsPHBE1nn8kcFIMnKp/view?usp=drive_link), unzip and copy it to './lama'.
96
+
97
+
98
+ Our Model:
99
+
100
+ 1. Download all the models (Builder, mimo_lora, CVSOS, CVRES, loras) from [lego-edit](https://huggingface.co/xiaomi-research/lego-edit/).
101
+
102
+
103
+ Your model structure should match the following:
104
+ ```bash
105
+ β”œβ”€β”€ README.md
106
+ β”œβ”€β”€ requirements.txt
107
+ β”œβ”€β”€ legodemo.py
108
+ β”œβ”€β”€ Builder/
109
+ β”œβ”€β”€ mimo_lora/
110
+ β”œβ”€β”€ models/
111
+ β”‚ β”œβ”€β”€ unet/
112
+ β”‚ β”œβ”€β”€ vae/
113
+ β”‚ β”œβ”€β”€ clip/
114
+ β”‚ └── loras/
115
+ β”œβ”€β”€ CVSOS/
116
+ β”œβ”€β”€ CVRES/
117
+ β”œβ”€β”€ lama/
118
+ β”‚ β”œβ”€β”€ big-lama/
119
+ ```
120
+
121
+
122
+ 3️⃣ Use Gradio WebUI to start playing with Lego-Edit!
123
+ ```bash
124
+ python legodemo.py
125
+ ```
126
+
127
+ ## πŸ’Ό New Tools Integration
128
+
129
+ Our Lego-Edit supports the integration of new tools. You can follow the steps below to add custom tools, and The Builder will be able to use them during image editing.
130
+
131
+ 1️⃣ Add the custom tools in system_prompt.txt
132
+
133
+ In system_prompt.txt, you can see many tools such as FASTINPAINT, FLUX-FILL, and more. You can add new tools to perform your desired editing tasks. For example, after the FLUX-POSE tool, you could define a new FLUX-SR tool to handle image super-resolution. In system_prompt.txt, you only need to add the model's description, inputs, outputs and constraints, as shown below:
134
+ ```bash
135
+ ...
136
+ 10.FLUX-POSE (Change the object's posture, expression, etc.)
137
+ Input: {Image[image], Str[prompt]}
138
+ Output: {Image[image]}
139
+ Constraint: The input prompt must provide a detailed description of the external characteristics of the modification target, such as gender, clothing, accessories, etc and don't use any PREDICT model in advance.
140
+ 11.FLUX-BRIGHT (Input image and ratio, adjust image brightness according to ratio)
141
+ Input: {Image[image], Float[ratio]}
142
+ Output: {Image[image]}
143
+ Constraint: The range of input ratio is > 0, where 0 represents the darkest, 0.5 represents it remains unchanged and > 0.5 represents the brighter.
144
+ **Actual example1:**
145
+ ...
146
+ ```
147
+
148
+
149
+ 2️⃣ Add the tool function in legodemo.py
150
+
151
+ In the initialize_model_mapping function within legodemo.py, add the function name of your new tool.
152
+ ```python
153
+ def initialize_model_mapping(self) -> Dict[str, Any]:
154
+ return {
155
+ "CMI-PRED": self.dummy_captionmask_pred,
156
+ "RES": self.dummy_res,
157
+ "MASK-SEG": self.dummy_mask_seg,
158
+ "FASTINPAINT": self.dummy_fastinpaint,
159
+ "FLUX-FILL": self.dummy_flux_fill,
160
+ "FLUX-INPAINT": self.dummy_flux_inpaint,
161
+ "INVERSE": self.dummy_inverse,
162
+ "COMPOSE": self.dummy_compose,
163
+ "RESIZE": self.dummy_resize,
164
+ "BBOX": self.dummy_bbox,
165
+ "SOS": self.dummy_sos,
166
+ "FLUX-CBG": self.dummy_flux_cbg,
167
+ "ADD-PRED": self.dummy_add_pred,
168
+ "FLUX-STYLE": self.dummy_flux_style,
169
+ "FLUX-RCM": self.dummy_flux_rcm,
170
+ "FLUX-ENV": self.dummy_flux_env,
171
+ "FLUX-POSE": self.dummy_flux_pose,
172
+ "FLUX-BRIGHT": self.dummy_flux_bright
173
+ }
174
+ ```
175
+
176
+ Complete your function (dummy_flux_sr) in legodemo.py.
177
+ ```python
178
+ def dummy_flux_bright(self, inputs: Dict[str, DataObject]) -> Dict[str, DataObject]:
179
+ image_ori = inputs['image'].copy()
180
+ ratio = inputs['ratio']
181
+ input_value = max(0.0, min(1.0, ratio))
182
+ ratio = 2 * input_value
183
+ image_pil = Image.fromarray(image_ori[:,:,::-1])
184
+ enhancer = ImageEnhance.Brightness(image_pil)
185
+ image_new = enhancer.enhance(ratio)
186
+ image_new = np.array(image_new)[:,:,::-1]
187
+ return {
188
+ "image": image_new
189
+ }
190
+ ```
191
+
192
+ 3️⃣ Restart Gradio WebUI
193
+
194
+ <p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/bright.png" width="95%"></p>
195
+
196
+
197
+ ## πŸ“ More Usages
198
+
199
+ Some editing models are trained at a resolution of 768 via the ICEdit method, prioritizing higher output quality over the standard 512 resolution. We provide the corresponding trained [Single-Task-LoRA](https://huggingface.co/xiaomi-research/lego-edit/tree/main/loras). Based on our testing, these models deliver superior performance within their specific functional domains.
200
+
201
+ <p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/lora_effect.png" width="95%"></p>
202
+
203
+ You can refer to the usage instructions at [ICEdit](https://github.com/River-Zhang/ICEdit) to use these LoRAs independently.
204
+
205
+
206
+ ## πŸ“„ Disclaimer
207
+
208
+ We open-source this project for academic research. The vast majority of images
209
+ used in this project are either generated or licensed. If you have any concerns,
210
+ please contact us, and we will promptly remove any inappropriate content.
211
+ Our code is released under the Apache 2.0 License, while our models are under
212
+ the CC BY-NC 4.0 License. Any models related to <a href="https://huggingface.co/black-forest-labs/FLUX.1-dev" target="_blank">FLUX.1-dev</a>
213
+ base model must adhere to the original licensing terms.
214
+ <br><br>This research aims to advance the field of generative AI. Users are free to
215
+ create images using this tool, provided they comply with local laws and exercise
216
+ responsible usage. The developers are not liable for any misuse of the tool by users.
217
+
218
+ ## ✍️ Citation
219
+
220
+ If this repo is helpful, please help to ⭐ it.
221
+
222
+ If you find this project useful for your research, please consider citing our paper:
223
+
224
+ ```bibtex
225
+ @article{jia2025legoedit,
226
+ title = {Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder},
227
+ author = {Qifei Jia and Yu Liu and Yajie Chai and Xintong Yao and Qiming Lu and Yasen Zhang and Runyu Shi and Ying Huang and Guoquan Zhang},
228
+ journal = {arXiv preprint arXiv:2509.12883},
229
+ year = {2025},
230
+ url = {https://arxiv.org/abs/2509.12883}
231
+ }
232
+ ```
233
+
234
+ ## πŸ™ Acknowledgments
235
+
236
+ - Built on the [MiMo-VL](https://github.com/XiaomiMiMo/MiMo-VL), [ComfyUI](https://github.com/comfyanonymous/ComfyUI), [FLUX](https://github.com/black-forest-labs/flux), [ICEdit](https://github.com/River-Zhang/ICEdit), [EVF-SAM](https://github.com/hustvl/EVF-SAM) and [LaMa](https://github.com/advimman/lama)