|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
pipeline_tag: image-to-image |
|
|
--- |
|
|
|
|
|
# 🪶 MagicQuill V2: Precise and Interactive Image Editing with Layered Visual Cues |
|
|
|
|
|
- **Paper:** [MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues](https://huggingface.co/papers/2512.03046) |
|
|
- **Project Page:** https://magicquill.art/v2/ |
|
|
- **Code Repository:** https://github.com/zliucz/MagicQuillV2 |
|
|
- **Hugging Face Spaces Demo:** https://huggingface.co/spaces/AI4Editing/MagicQuillV2 |
|
|
|
|
|
<br> |
|
|
|
|
|
<div align="center"> |
|
|
<video src="https://github.com/user-attachments/assets/58079152-7729-48ed-9bb4-0ddfd1873dd0" width="100%" controls autoplay muted loop></video> |
|
|
</div> |
|
|
|
|
|
<br> |
|
|
|
|
|
**TLDR:** MagicQuill V2 introduces a layered composition paradigm to generative image editing, disentangling creative intent into controllable visual cues (Content, Spatial, Structural, Color) for precise and intuitive control. |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
Our model is based on Flux Kontext, which is large and computationally intensive. |
|
|
- **VRAM**: Approximately **40GB** of VRAM is required for inference. |
|
|
- **Speed**: It takes about **30 seconds** to generate a single image. |
|
|
|
|
|
> **Important**: This is a research project focused on pushing the boundaries of interactive image editing. If you do not have sufficient GPU memory, we recommend checking out our [**MagicQuill V1**](https://github.com/ant-research/MagicQuill) or trying the online demo on [**Hugging Face Spaces**](https://huggingface.co/spaces/AI4Editing/MagicQuillV2). |
|
|
|
|
|
## Setup |
|
|
|
|
|
1. **Clone the repository** |
|
|
```bash |
|
|
git clone https://github.com/magic-quill/MagicQuillV2.git |
|
|
cd MagicQuillV2 |
|
|
``` |
|
|
|
|
|
2. **Create environment** |
|
|
```bash |
|
|
conda create -n MagicQuillV2 python=3.10 -y |
|
|
conda activate MagicQuillV2 |
|
|
``` |
|
|
|
|
|
3. **Install dependencies** |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
4. **Download models** |
|
|
Download the models from [Hugging Face](https://huggingface.co/LiuZichen/MagicQuillV2-models) and place them in the `models/` directory. |
|
|
|
|
|
```bash |
|
|
huggingface-cli download LiuZichen/MagicQuillV2-models --local-dir models |
|
|
``` |
|
|
|
|
|
5. **Run the demo** |
|
|
```bash |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
## System Overview |
|
|
|
|
|
The MagicQuill V2 interactive system is designed to unify our layered composition framework. |
|
|
|
|
|
<div align="center"> |
|
|
<img src="https://github.com/zliucz/MagicQuillV2/raw/main/assets/V2_UI.png" alt="MagicQuill V2 UI" width="100%"> |
|
|
</div> |
|
|
|
|
|
### Key Upgrades from V1 |
|
|
|
|
|
1. **Toolbar (A)**: Features a new **Local Edit Brush** for defining the target editing area, along with tools for sketching edges and applying color. |
|
|
2. **Visual Cue Manager (B)**: Holds all content layer visual cues (**foreground props**) that users can drag onto the canvas to define what to generate. |
|
|
3. **Image Segmentation Panel (C)**: Accessed via the segment icon, this panel allows precise object extraction using SAM (Segment Anything Model) with positive/negative dots or bounding boxes. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find MagicQuill V2 useful for your research, please cite our paper: |
|
|
|
|
|
```bibtex |
|
|
@article{liu2025magicquillv2, |
|
|
title={MagicQuill V2: Precise and Interactive Image Editing with Layered Visual Cues}, |
|
|
author={Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Shuailei Ma, Ka Leong Cheng, Wen Wang, Qingyan Bai, Yuxuan Zhang, Yanhong Zeng, Yixuan Li, Xing Zhu, Yujun Shen, Qifeng Chen}, |
|
|
journal={arXiv:2512.03046}, |
|
|
year={2025} |
|
|
} |
|
|
``` |