File size: 3,417 Bytes
eeaa5e6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: cc-by-nc-sa-4.0
pipeline_tag: image-to-image
---

# 🪶 MagicQuill V2: Precise and Interactive Image Editing with Layered Visual Cues

- **Paper:** [MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues](https://huggingface.co/papers/2512.03046)
- **Project Page:** https://magicquill.art/v2/
- **Code Repository:** https://github.com/zliucz/MagicQuillV2
- **Hugging Face Spaces Demo:** https://huggingface.co/spaces/AI4Editing/MagicQuillV2

<br>

<div align="center">
  <video src="https://github.com/user-attachments/assets/58079152-7729-48ed-9bb4-0ddfd1873dd0" width="100%" controls autoplay muted loop></video>
</div>

<br>

**TLDR:** MagicQuill V2 introduces a layered composition paradigm to generative image editing, disentangling creative intent into controllable visual cues (Content, Spatial, Structural, Color) for precise and intuitive control.

## Hardware Requirements

Our model is based on Flux Kontext, which is large and computationally intensive.
- **VRAM**: Approximately **40GB** of VRAM is required for inference.
- **Speed**: It takes about **30 seconds** to generate a single image.

> **Important**: This is a research project focused on pushing the boundaries of interactive image editing. If you do not have sufficient GPU memory, we recommend checking out our [**MagicQuill V1**](https://github.com/ant-research/MagicQuill) or trying the online demo on [**Hugging Face Spaces**](https://huggingface.co/spaces/AI4Editing/MagicQuillV2).

## Setup

1.  **Clone the repository**
    ```bash
    git clone https://github.com/magic-quill/MagicQuillV2.git
    cd MagicQuillV2
    ```

2.  **Create environment**
    ```bash
    conda create -n MagicQuillV2 python=3.10 -y
    conda activate MagicQuillV2
    ```

3.  **Install dependencies**
    ```bash
    pip install -r requirements.txt
    ```

4.  **Download models**
    Download the models from [Hugging Face](https://huggingface.co/LiuZichen/MagicQuillV2-models) and place them in the `models/` directory.

    ```bash
    huggingface-cli download LiuZichen/MagicQuillV2-models --local-dir models
    ```

5.  **Run the demo**
    ```bash
    python app.py
    ```

## System Overview

The MagicQuill V2 interactive system is designed to unify our layered composition framework.

<div align="center">
  <img src="https://github.com/zliucz/MagicQuillV2/raw/main/assets/V2_UI.png" alt="MagicQuill V2 UI" width="100%">
</div>

### Key Upgrades from V1

1.  **Toolbar (A)**: Features a new **Local Edit Brush** for defining the target editing area, along with tools for sketching edges and applying color.
2.  **Visual Cue Manager (B)**: Holds all content layer visual cues (**foreground props**) that users can drag onto the canvas to define what to generate.
3.  **Image Segmentation Panel (C)**: Accessed via the segment icon, this panel allows precise object extraction using SAM (Segment Anything Model) with positive/negative dots or bounding boxes.

## Citation

If you find MagicQuill V2 useful for your research, please cite our paper:

```bibtex
@article{liu2025magicquillv2,
  title={MagicQuill V2: Precise and Interactive Image Editing with Layered Visual Cues},
  author={Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Shuailei Ma, Ka Leong Cheng, Wen Wang, Qingyan Bai, Yuxuan Zhang, Yanhong Zeng, Yixuan Li, Xing Zhu, Yujun Shen, Qifeng Chen},
  journal={arXiv:2512.03046},
  year={2025}
}
```