Image-to-Image
Diffusers
Safetensors
English
Image-to-Image
ControlNet
Diffusers
QwenImageControlNetInpaintPipeline
Qwen-Image
Instructions to use InstantX/Qwen-Image-ControlNet-Inpainting with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use InstantX/Qwen-Image-ControlNet-Inpainting with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("InstantX/Qwen-Image-ControlNet-Inpainting", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,87 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
library_name: diffusers
|
| 6 |
+
pipeline_tag: image-to-image
|
| 7 |
+
tags:
|
| 8 |
+
- Image-to-Image
|
| 9 |
+
- ControlNet
|
| 10 |
+
- Diffusers
|
| 11 |
+
- QwenImageControlNetInpaintPipeline
|
| 12 |
+
- Qwen-Image
|
| 13 |
+
base_model: Qwen/Qwen-Image
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
# Qwen-Image-ControlNet-Inpainting
|
| 18 |
+
This repository provides a ControlNet that supports mask-based image inpainting and outpainting for [Qwen-Image](https://github.com/QwenLM/Qwen-Image).
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
# Model Cards
|
| 22 |
+
- This ControlNet consists of 6 double blocks copied from the pretrained transformer layers.
|
| 23 |
+
- We train the model from scratch for 65K steps using a dataset of 10M high-quality general and human images.
|
| 24 |
+
- We train at 1328x1328 resolution in BFloat16, batch size=128, learning rate=4e-5. We set the text drop ratio to 0.10.
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
# Showcases
|
| 28 |
+
<table style="width:100%; table-layout:fixed;">
|
| 29 |
+
<tr>
|
| 30 |
+
<td><img src="./assets/images/image1.png" alt="example1"></td>
|
| 31 |
+
<td><img src="./assets/masks/mask1.png" alt="example1"></td>
|
| 32 |
+
<td><img src="./assets/results/output1.png" alt="example1"></td>
|
| 33 |
+
</tr>
|
| 34 |
+
<tr>
|
| 35 |
+
<td><img src="./assets/images/image2.png" alt="example2"></td>
|
| 36 |
+
<td><img src="./assets/masks/mask2.png" alt="example2"></td>
|
| 37 |
+
<td><img src="./assets/results/output2.png" alt="example2"></td>
|
| 38 |
+
</tr>
|
| 39 |
+
<tr>
|
| 40 |
+
<td><img src="./assets/images/image3.png" alt="example3"></td>
|
| 41 |
+
<td><img src="./assets/masks/mask3.png" alt="example3"></td>
|
| 42 |
+
<td><img src="./assets/results/output3.png" alt="example3"></td>
|
| 43 |
+
</tr>
|
| 44 |
+
</table>
|
| 45 |
+
|
| 46 |
+
# Inference
|
| 47 |
+
```python
|
| 48 |
+
import torch
|
| 49 |
+
from diffusers.utils import load_image
|
| 50 |
+
|
| 51 |
+
# pip install git+https://github.com/huggingface/diffusers
|
| 52 |
+
from diffusers import QwenImageControlNetModel, QwenImageControlNetInpaintPipeline
|
| 53 |
+
|
| 54 |
+
base_model = "Qwen/Qwen-Image"
|
| 55 |
+
controlnet_model = "InstantX/Qwen-Image-ControlNet-Inpainting"
|
| 56 |
+
|
| 57 |
+
controlnet = QwenImageControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
|
| 58 |
+
|
| 59 |
+
pipe = QwenImageControlNetPipeline.from_pretrained(
|
| 60 |
+
base_model, controlnet=controlnet, torch_dtype=torch.bfloat16
|
| 61 |
+
)
|
| 62 |
+
pipe.to("cuda")
|
| 63 |
+
|
| 64 |
+
image = load_image("https://huggingface.co/InstantX/Qwen-Image-ControlNet-Inpainting/resolve/main/assets/images/image1.png")
|
| 65 |
+
mask_image = load_image("https://huggingface.co/InstantX/Qwen-Image-ControlNet-Inpainting/resolve/main/assets/masks/mask1.png")
|
| 66 |
+
prompt = "一辆绿色的出租车行驶在路上"
|
| 67 |
+
|
| 68 |
+
image = pipe(
|
| 69 |
+
prompt=prompt,
|
| 70 |
+
negative_prompt=" ",
|
| 71 |
+
control_image=image,
|
| 72 |
+
control_mask=mask_image,
|
| 73 |
+
controlnet_conditioning_scale=controlnet_conditioning_scale,
|
| 74 |
+
width=control_image.size[0],
|
| 75 |
+
height=control_image.size[1],
|
| 76 |
+
num_inference_steps=30,
|
| 77 |
+
true_cfg_scale=4.0,
|
| 78 |
+
generator=torch.Generator(device="cuda").manual_seed(42),
|
| 79 |
+
).images[0]
|
| 80 |
+
image.save(f"qwenimage_cn_inpaint_result.png")
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
# Limitations
|
| 84 |
+
This model is slightly sensitive to user prompts. Using detailed prompts that describe the entire image (both the inpainted area and the background) is highly recommended. Please use descriptive prompt instead of instructive prompt.
|
| 85 |
+
|
| 86 |
+
# Acknowledgements
|
| 87 |
+
This model is developed by InstantX Team. All copyright reserved.
|