Improve model card for Lego-Edit: Add metadata, links, abstract, and structure
#4
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,3 +1,79 @@
|
|
| 1 |
-
---
|
| 2 |
-
license:
|
| 3 |
-
--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
pipeline_tag: image-to-image
|
| 4 |
+
library_name: transformers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder
|
| 8 |
+
|
| 9 |
+
<p align="center">
|
| 10 |
+
<img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/lego_pic.png" alt="Lego-Edit" width="240"/>
|
| 11 |
+
</p>
|
| 12 |
+
|
| 13 |
+
Lego-Edit is an instruction-based image editing framework introduced in the paper [Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder](https://huggingface.co/papers/2509.12883).
|
| 14 |
+
|
| 15 |
+
- π **Paper**: [Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder](https://huggingface.co/papers/2509.12883)
|
| 16 |
+
- π **Project Page**: https://xiaomi-research.github.io/lego-edit/
|
| 17 |
+
- π» **Code / GitHub Repository**: https://github.com/xiaomi-research/lego-edit
|
| 18 |
+
- π **Live Demo**: https://editdemo.ai.xiaomi.net/
|
| 19 |
+
|
| 20 |
+
## Abstract
|
| 21 |
+
Instruction-based image editing has garnered significant attention due to its direct interaction with users. However, real-world user instructions are immensely diverse, and existing methods often fail to generalize effectively to instructions outside their training domain, limiting their practical application. To address this, we propose Lego-Edit, which leverages the generalization capability of Multi-modal Large Language Model (MLLM) to organize a suite of model-level editing tools to tackle this challenge. Lego-Edit incorporates two key designs: (1) a model-level toolkit comprising diverse models efficiently trained on limited data and several image manipulation functions, enabling fine-grained composition of editing actions by the MLLM; and (2) a three-stage progressive reinforcement learning approach that uses feedback on unannotated, open-domain instructions to train the MLLM, equipping it with generalized reasoning capabilities for handling real-world instructions. Experiments demonstrate that Lego-Edit achieves state-of-the-art performance on GEdit-Bench and ImgBench. It exhibits robust reasoning capabilities for open-domain instructions and can utilize newly introduced editing tools without additional fine-tuning. The figure below showcases Lego-Edit's qualitative performance.
|
| 22 |
+
|
| 23 |
+
<p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/case_pic.png" width="95%"></p>
|
| 24 |
+
|
| 25 |
+
## β¨ Features
|
| 26 |
+
|
| 27 |
+
Lego-Edit supports local editing, global editing, and multi-step editing as demonstrated in our tests, with corresponding results shown above. We discuss its feedback responsiveness and tool-extension capabilities in our paper.
|
| 28 |
+
|
| 29 |
+
Additionally, Lego-Edit accepts mask inputs for precise editing region control. Example applications are provided here:
|
| 30 |
+
|
| 31 |
+
<p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/maskcase1.png" width="95%"></p>
|
| 32 |
+
|
| 33 |
+
<p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/maskcase2.png" width="95%"></p>
|
| 34 |
+
|
| 35 |
+
You can try it and find more usages of this framework.
|
| 36 |
+
|
| 37 |
+
## π₯ Quick Start
|
| 38 |
+
|
| 39 |
+
For detailed instructions on setting up the environment, downloading checkpoints, and running the Gradio WebUI, please refer to the [Quick Start section in the GitHub repository](https://github.com/xiaomi-research/lego-edit#--quick-start).
|
| 40 |
+
|
| 41 |
+
## πΌ New Tools Integration
|
| 42 |
+
|
| 43 |
+
Lego-Edit supports the integration of new tools. For guidance on how to add custom tools and make them usable by The Builder, please refer to the [New Tools Integration section in the GitHub repository](https://github.com/xiaomi-research/lego-edit#--new-tools-integration).
|
| 44 |
+
|
| 45 |
+
## π More Usages
|
| 46 |
+
|
| 47 |
+
Some editing models are trained at a resolution of 768 via the ICEdit method. The corresponding trained [Single-Task-LoRA](https://huggingface.co/xiaomi-research/lego-edit/tree/main/loras) are provided. For independent usage of these LoRAs, refer to the usage instructions at [ICEdit](https://github.com/River-Zhang/ICEdit).
|
| 48 |
+
|
| 49 |
+
<p align="center"><img src="https://github.com/xiaomi-research/lego-edit/raw/main/resources/lora_effect.png" width="95%"></p>
|
| 50 |
+
|
| 51 |
+
## π Disclaimer
|
| 52 |
+
|
| 53 |
+
We open-source this project for academic research. The vast majority of images
|
| 54 |
+
used in this project are either generated or licensed. If you have any concerns,
|
| 55 |
+
please contact us, and we will promptly remove any inappropriate content.
|
| 56 |
+
Our code is released under the Apache 2.0 License, while our models are under
|
| 57 |
+
the CC BY-NC 4.0 License. Any models related to <a href="https://huggingface.co/black-forest-labs/FLUX.1-dev" target="_blank">FLUX.1-dev</a>
|
| 58 |
+
base model must adhere to the original licensing terms.
|
| 59 |
+
<br><br>This research aims to advance the field of generative AI. Users are free to
|
| 60 |
+
create images using this tool, provided they comply with local laws and exercise
|
| 61 |
+
responsible usage. The developers are not liable for any misuse of the tool by users.
|
| 62 |
+
|
| 63 |
+
## βοΈ Citation
|
| 64 |
+
|
| 65 |
+
If you find this project useful for your research, please consider citing our paper:
|
| 66 |
+
|
| 67 |
+
```bibtex
|
| 68 |
+
@article{jia2025legoedit,
|
| 69 |
+
title = {Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder},
|
| 70 |
+
author = {Qifei Jia and Yu Liu and Yajie Chai and Xintong Yao and Qiming Lu and Yasen Zhang and Runyu Shi and Ying Huang and Guoquan Zhang},
|
| 71 |
+
journal = {arXiv preprint arXiv:2509.12883},
|
| 72 |
+
year = {2025},
|
| 73 |
+
url = {https://arxiv.org/abs/2509.12883}
|
| 74 |
+
}
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
## π Acknowledgments
|
| 78 |
+
|
| 79 |
+
- Built on the [MiMo-VL](https://github.com/XiaomiMiMo/MiMo-VL), [ComfyUI](https://github.com/comfyanonymous/ComfyUI), [FLUX](https://github.com/black-forest-labs/flux), [ICEdit](https://github.com/River-Zhang/ICEdit), [EVF-SAM](https://github.com/hustvl/EVF-SAM) and [LaMa](https://github.com/advimman/lama)
|