| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - zh |
| | base_model: |
| | - Qwen/Qwen-Image |
| | pipeline_tag: image-text-to-image |
| | library_name: diffusers |
| | --- |
| | <p align="center"> |
| | <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/layered/qwen-image-layered-logo.png" width="800"/> |
| | <p> |
| | <p align="center">  🤗 <a href="https://huggingface.co/Qwen/Qwen-Image-Layered">HuggingFace</a>   |   🤖 <a href="https://modelscope.cn/models/Qwen/Qwen-Image-Layered">ModelScope</a>   |    📑 <a href="https://arxiv.org/abs/2512.15603">Research Paper</a>    |    📑 <a href="https://qwenlm.github.io/blog/qwen-image-layered/">Blog</a>    |    🤗 <a href="https://huggingface.co/spaces/Qwen/Qwen-Image-Layered">Demo</a>    |
| | </p> |
| | |
| | <p align="center"> |
| | <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/layered/layered.JPG" width="1024"/> |
| | <p> |
| | |
| | ## Introduction |
| | We are excited to introduce **Qwen-Image-Layered**, a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks **inherent editability**: each layer can be independently manipulated without affecting other content. Meanwhile, such a layered representation naturally supports **high-fidelity elementary operations**-such as resizing, reposition, and recoloring. By physically isolating semantic or structural components into distinct layers, our approach enables high-fidelity and consistent editing. |
| |
|
| | ## Quick Start |
| |
|
| | 1. Make sure your transformers>=4.51.3 (Supporting Qwen2.5-VL) |
| |
|
| | 2. Install the latest version of diffusers |
| | ``` |
| | pip install git+https://github.com/huggingface/diffusers |
| | pip install python-pptx |
| | ``` |
| |
|
| |
|
| | ```python |
| | from diffusers import QwenImageLayeredPipeline |
| | import torch |
| | from PIL import Image |
| | |
| | pipeline = QwenImageLayeredPipeline.from_pretrained("Qwen/Qwen-Image-Layered") |
| | pipeline = pipeline.to("cuda", torch.bfloat16) |
| | pipeline.set_progress_bar_config(disable=None) |
| | |
| | image = Image.open("asserts/test_images/1.png").convert("RGBA") |
| | inputs = { |
| | "image": image, |
| | "generator": torch.Generator(device='cuda').manual_seed(777), |
| | "true_cfg_scale": 4.0, |
| | "negative_prompt": " ", |
| | "num_inference_steps": 50, |
| | "num_images_per_prompt": 1, |
| | "layers": 4, |
| | "resolution": 640, # Using different bucket (640, 1024) to determine the resolution. For this version, 640 is recommended |
| | "cfg_normalize": True, # Whether enable cfg normalization. |
| | "use_en_prompt": True, # Automatic caption language if user does not provide caption |
| | } |
| | |
| | with torch.inference_mode(): |
| | output = pipeline(**inputs) |
| | output_image = output.images[0] |
| | |
| | for i, image in enumerate(output_image): |
| | image.save(f"{i}.png") |
| | ``` |
| |
|
| |
|
| | ## Showcase |
| | ### Layered Decomposition in Application |
| | Given an image, Qwen-Image-Layered can decompose it into several RGBA layers: |
| |  |
| |
|
| | After decomposition, edits are applied exclusively to the target layer, physically isolating it from the rest of the content, and thereby fundamentally ensuring consistency across edits. |
| |
|
| | For example, we can recolor the first layer and keep all other content untouched: |
| |  |
| |
|
| | We can also replace the second layer from a girl to a boy (The target layer is edited using Qwen-Image-Edit): |
| |  |
| |
|
| | Here, we revise the text to "Qwen-Image" (The target layer is edited using Qwen-Image-Edit): |
| |  |
| |
|
| | Furthermore, the layered structure naturally supports elemetary operations. For example, we can delete unwanted objects cleanly: |
| |  |
| |
|
| | We can also resize an object without distortion: |
| |  |
| |
|
| | After layer decomposition, we can move objects freely within the canvas: |
| |  |
| |
|
| | ### Flexible and Iterative Decomposition |
| | Qwen-Image-Layered is not limited to a fixed number of layers. The model supports variable-layer decomposition. For example, we can decompose an image into either 3 or 8 layers as needed: |
| |
|
| |  |
| |
|
| | Moreover, decomposition can be applied recursively: any layer can itself be further decomposed, enabling infinite decomposition. |
| |
|
| |  |
| |
|
| |
|
| | ## License Agreement |
| |
|
| | Qwen-Image-Layered is licensed under Apache 2.0. |
| |
|
| | ## Citation |
| |
|
| | We kindly encourage citation of our work if you find it useful. |
| |
|
| | ```bibtex |
| | @misc{yin2025qwenimagelayered, |
| | title={Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition}, |
| | author={Shengming Yin, Zekai Zhang, Zecheng Tang, Kaiyuan Gao, Xiao Xu, Kun Yan, Jiahao Li, Yilei Chen, Yuxiang Chen, Heung-Yeung Shum, Lionel M. Ni, Jingren Zhou, Junyang Lin, Chenfei Wu}, |
| | year={2025}, |
| | eprint={2512.15603}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV}, |
| | url={https://arxiv.org/abs/2512.15603}, |
| | } |
| | ``` |