Instructions to use terryrdt/Qwen-Image-Layered with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use terryrdt/Qwen-Image-Layered with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("terryrdt/Qwen-Image-Layered", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - en | |
| - zh | |
| base_model: | |
| - Qwen/Qwen-Image | |
| pipeline_tag: image-text-to-image | |
| library_name: diffusers | |
| <p align="center"> | |
| <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/layered/qwen-image-layered-logo.png" width="800"/> | |
| <p> | |
| <p align="center">  🤗 <a href="https://huggingface.co/Qwen/Qwen-Image-Layered">HuggingFace</a>   |   🤖 <a href="https://modelscope.cn/models/Qwen/Qwen-Image-Layered">ModelScope</a>   |    📑 <a href="https://arxiv.org/abs/2512.15603">Research Paper</a>    |    📑 <a href="https://qwenlm.github.io/blog/qwen-image-layered/">Blog</a>    |    🤗 <a href="https://huggingface.co/spaces/Qwen/Qwen-Image-Layered">Demo</a>    | |
| </p> | |
| <p align="center"> | |
| <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/layered/layered.JPG" width="1024"/> | |
| <p> | |
| ## Introduction | |
| We are excited to introduce **Qwen-Image-Layered**, a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks **inherent editability**: each layer can be independently manipulated without affecting other content. Meanwhile, such a layered representation naturally supports **high-fidelity elementary operations**-such as resizing, reposition, and recoloring. By physically isolating semantic or structural components into distinct layers, our approach enables high-fidelity and consistent editing. | |
| ## Quick Start | |
| 1. Make sure your transformers>=4.51.3 (Supporting Qwen2.5-VL) | |
| 2. Install the latest version of diffusers | |
| ``` | |
| pip install git+https://github.com/huggingface/diffusers | |
| pip install python-pptx | |
| ``` | |
| ```python | |
| from diffusers import QwenImageLayeredPipeline | |
| import torch | |
| from PIL import Image | |
| pipeline = QwenImageLayeredPipeline.from_pretrained("Qwen/Qwen-Image-Layered") | |
| pipeline = pipeline.to("cuda", torch.bfloat16) | |
| pipeline.set_progress_bar_config(disable=None) | |
| image = Image.open("asserts/test_images/1.png").convert("RGBA") | |
| inputs = { | |
| "image": image, | |
| "generator": torch.Generator(device='cuda').manual_seed(777), | |
| "true_cfg_scale": 4.0, | |
| "negative_prompt": " ", | |
| "num_inference_steps": 50, | |
| "num_images_per_prompt": 1, | |
| "layers": 4, | |
| "resolution": 640, # Using different bucket (640, 1024) to determine the resolution. For this version, 640 is recommended | |
| "cfg_normalize": True, # Whether enable cfg normalization. | |
| "use_en_prompt": True, # Automatic caption language if user does not provide caption | |
| } | |
| with torch.inference_mode(): | |
| output = pipeline(**inputs) | |
| output_image = output.images[0] | |
| for i, image in enumerate(output_image): | |
| image.save(f"{i}.png") | |
| ``` | |
| ## Showcase | |
| ### Layered Decomposition in Application | |
| Given an image, Qwen-Image-Layered can decompose it into several RGBA layers: | |
|  | |
| After decomposition, edits are applied exclusively to the target layer, physically isolating it from the rest of the content, and thereby fundamentally ensuring consistency across edits. | |
| For example, we can recolor the first layer and keep all other content untouched: | |
|  | |
| We can also replace the second layer from a girl to a boy (The target layer is edited using Qwen-Image-Edit): | |
|  | |
| Here, we revise the text to "Qwen-Image" (The target layer is edited using Qwen-Image-Edit): | |
|  | |
| Furthermore, the layered structure naturally supports elemetary operations. For example, we can delete unwanted objects cleanly: | |
|  | |
| We can also resize an object without distortion: | |
|  | |
| After layer decomposition, we can move objects freely within the canvas: | |
|  | |
| ### Flexible and Iterative Decomposition | |
| Qwen-Image-Layered is not limited to a fixed number of layers. The model supports variable-layer decomposition. For example, we can decompose an image into either 3 or 8 layers as needed: | |
|  | |
| Moreover, decomposition can be applied recursively: any layer can itself be further decomposed, enabling infinite decomposition. | |
|  | |
| ## License Agreement | |
| Qwen-Image-Layered is licensed under Apache 2.0. | |
| ## Citation | |
| We kindly encourage citation of our work if you find it useful. | |
| ```bibtex | |
| @misc{yin2025qwenimagelayered, | |
| title={Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition}, | |
| author={Shengming Yin, Zekai Zhang, Zecheng Tang, Kaiyuan Gao, Xiao Xu, Kun Yan, Jiahao Li, Yilei Chen, Yuxiang Chen, Heung-Yeung Shum, Lionel M. Ni, Jingren Zhou, Junyang Lin, Chenfei Wu}, | |
| year={2025}, | |
| eprint={2512.15603}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CV}, | |
| url={https://arxiv.org/abs/2512.15603}, | |
| } | |
| ``` |