--- license: apache-2.0 --- # Qwen-Image-Layered ## Model Introduction This model is trained based on the model [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) using the dataset [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro), enabling text-controlled extraction of segmented layers. For more details about training strategies and implementation, feel free to check our [technical blog](https://modelscope.cn/learn/4938). ## Usage Tips * The model architecture has been changed from multi-image output to single-image output, producing only the layer relevant to the provided text description. * The model was trained exclusively on English text, but retains Chinese language understanding capabilities inherited from the base model. * The native training resolution is 1024x1024; however, inference at other resolutions is supported. * The model struggles to separate multiple entities that are heavily occluded or overlapping, such as the cartoon skeleton head and hat in the examples. * The model excels at decomposing poster-like graphics but performs poorly on photographic images, especially those involving complex lighting and shadows. * The model supports negative prompts—users can specify content they wish to exclude via negative prompt descriptions. ## Demo Examples **Some images contain white text on light backgrounds. ModelScope users should click the "☀︎" icon in the top-right corner to switch to dark mode for better visibility.** ### Example 1
|Input Image| |-| |![](./assets/image_1_input.png)|
|Prompt|Output Image|Prompt|Output Image| |-|-|-|-| |A solid, uniform color with no distinguishable features or objects|![](./assets/image_1_0_0.png)|Text 'TRICK'|![](./assets/image_1_4_0.png)| |Cloud|![](./assets/image_1_1_0.png)|Text 'TRICK OR TREAT'|![](./assets/image_1_3_0.png)| |A cartoon skeleton character wearing a purple hat and holding a gift box|![](./assets/image_1_2_0.png)|Text 'TRICK OR'|![](./assets/image_1_7_0.png)| |A purple hat and a head|![](./assets/image_1_5_0.png)|A gift box|![](./assets/image_1_6_0.png)|
### Example 2
|Input Image| |-| |![](./assets/image_2_input.png)|
|Prompt|Output Image|Prompt|Output Image| |-|-|-|-| |Blue sky, white clouds, a garden with colorful flowers|![](./assets/image_2_0_0.png)|Colorful, intricate floral wreath|![](./assets/image_2_2_0.png)| |Girl, wreath, kitten|![](./assets/image_2_1_0.png)|Girl, kitten|![](./assets/image_2_3_0.png)|
### Example 3
|Input Image| |-| |![](./assets/image_3_input.png)|
|Prompt|Output Image|Prompt|Output Image| |-|-|-|-| |A clear blue sky and a turbulent sea|![](./assets/image_3_0_0.png)|Text "The Life I Long For"|![](./assets/image_3_2_0.png)| |A seagull|![](./assets/image_3_1_0.png)|Text "Life"|![](./assets/image_3_3_0.png)|
## Inference Code Install DiffSynth-Studio: ``` git clone https://github.com/modelscope/DiffSynth-Studio.git cd DiffSynth-Studio pip install -e . ``` Model inference: ```python from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig from PIL import Image import torch, requests pipe = QwenImagePipeline.from_pretrained( torch_dtype=torch.bfloat16, device="cuda", model_configs=[ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Layered-Control", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"), ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"), ModelConfig(model_id="Qwen/Qwen-Image-Layered", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), ], processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"), ) prompt = "A cartoon skeleton character wearing a purple hat and holding a gift box" input_image = requests.get("https://modelscope.oss-cn-beijing.aliyuncs.com/resource/images/trick_or_treat.png", stream=True).raw input_image = Image.open(input_image).convert("RGBA").resize((1024, 1024)) input_image.save("image_input.png") images = pipe( prompt, seed=0, num_inference_steps=30, cfg_scale=4, height=1024, width=1024, layer_input_image=input_image, layer_num=0, ) images[0].save("image.png") ```