File size: 6,443 Bytes
0f3b583
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---

frameworks:
- Pytorch
license: apache-2.0
tags: []
tasks:
- text-to-image-synthesis
base_model:
  - Tongyi-MAI/Z-Image
base_model_relation: adapter
---
## 模型介绍

i2L (Image to LoRA) 模型是我们以疯狂的思路设计的模型结构。模型的输入为一张图片,输出为这张图片训练出的 LoRA 模型。本模型基于我们之前的 Qwen-Image-i2L([模型](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-i2L)、[技术博客](https://modelscope.cn/learn/3343)),进一步完善并迁移到 [Z-Image](https://modelscope.cn/models/Tongyi-MAI/Z-Image),着重增强了模型的风格保持能力。

为保证生成的图像质量,我们建议按以下参数使用本模型产生的 LoRA 模型:

* 使用负向提示词
    * 中文:`"泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符"`
    * 英文:`"Yellowed, green-tinted, blurry, low-resolution, low-quality image, distorted limbs, eerie appearance, ugly, AI-looking, noise, grid-like artifacts, JPEG compression artifacts, abnormal limbs, watermark, garbled text, meaningless characters"`
* `cfg_scale = 4`
* `sigma_shift = 8`
* 仅在正向提示词侧启用 LoRA,在负向提示词侧关闭 LoRA,这会提升图像质量

在线体验:https://modelscope.cn/studios/DiffSynth-Studio/Z-Image-i2L

## 效果展示

Z-Image-i2L 模型可用于快速生成风格 LoRA,只需输入几张风格统一的图像。以下是我们生成的结果,随机种子都是 0。

### 风格1:水彩绘画

输入图像:

|![](./assets/style/1/0.jpg)|![](./assets/style/1/1.jpg)|![](./assets/style/1/2.jpg)|![](./assets/style/1/3.jpg)|
|-|-|-|-|

生成图像:

|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/1/image_0.jpg)|![](./assets/style/1/image_1.jpg)|![](./assets/style/1/image_2.jpg)|

### 风格2:写实细节

输入图像:

|![](./assets/style/5/0.jpg)|![](./assets/style/5/1.jpg)|![](./assets/style/5/2.jpg)|![](./assets/style/5/3.jpg)|![](./assets/style/5/4.jpg)|
|-|-|-|-|-|

生成图像:

|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/5/image_0.jpg)|![](./assets/style/5/image_1.jpg)|![](./assets/style/5/image_2.jpg)|

### 风格3:缤纷色块

输入图像:

|![](./assets/style/2/0.jpg)|![](./assets/style/2/1.jpg)|![](./assets/style/2/2.jpg)|![](./assets/style/2/3.jpg)|![](./assets/style/2/4.jpg)|![](./assets/style/2/5.jpg)|
|-|-|-|-|-|-|

生成图像:

|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/2/image_0.jpg)|![](./assets/style/2/image_1.jpg)|![](./assets/style/2/image_2.jpg)|

### 风格4:鲜花少女

输入图像:

|![](./assets/style/3/0.jpg)|![](./assets/style/3/1.jpg)|![](./assets/style/3/2.jpg)|![](./assets/style/3/3.jpg)|
|-|-|-|-|

生成图像:

|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/3/image_0.jpg)|![](./assets/style/3/image_1.jpg)|![](./assets/style/3/image_2.jpg)|

### 风格5:黑白简约

输入图像:

|![](./assets/style/6/0.jpg)|![](./assets/style/6/1.jpg)|![](./assets/style/6/2.jpg)|![](./assets/style/6/3.jpg)|
|-|-|-|-|

生成图像:

|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/6/image_0.jpg)|![](./assets/style/6/image_1.jpg)|![](./assets/style/6/image_2.jpg)|

### 风格6:幻想世界

输入图像:

|![](./assets/style/4/0.jpg)|![](./assets/style/4/1.jpg)|![](./assets/style/4/2.jpg)|![](./assets/style/4/3.jpg)|![](./assets/style/4/4.jpg)|![](./assets/style/4/5.jpg)|
|-|-|-|-|-|-|

生成图像:

|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/4/image_0.jpg)|![](./assets/style/4/image_1.jpg)|![](./assets/style/4/image_2.jpg)|

## 推理代码

安装 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio):

```shell
git clone https://github.com/modelscope/DiffSynth-Studio.git  
cd DiffSynth-Studio
pip install -e .
```

模型推理:

```python
from diffsynth.pipelines.z_image import (
    ZImagePipeline, ModelConfig,
    ZImageUnit_Image2LoRAEncode, ZImageUnit_Image2LoRADecode
)
from modelscope import snapshot_download
from safetensors.torch import save_file
import torch
from PIL import Image

# Use `vram_config` to enable LoRA hot-loading
vram_config = {
    "offload_dtype": torch.bfloat16,
    "offload_device": "cuda",
    "onload_dtype": torch.bfloat16,
    "onload_device": "cuda",
    "preparing_dtype": torch.bfloat16,
    "preparing_device": "cuda",
    "computation_dtype": torch.bfloat16,
    "computation_device": "cuda",
}

# Load models
pipe = ZImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Tongyi-MAI/Z-Image", origin_file_pattern="transformer/*.safetensors", **vram_config),
        ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="text_encoder/*.safetensors"),
        ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
        ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="SigLIP2-G384/model.safetensors"),
        ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="DINOv3-7B/model.safetensors"),
        ModelConfig(model_id="DiffSynth-Studio/Z-Image-i2L", origin_file_pattern="model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"),
)

# Load images
snapshot_download(
    model_id="DiffSynth-Studio/Z-Image-i2L",
    allow_file_pattern="assets/style/*",
    local_dir="data/Z-Image-i2L_style_input"
)
images = [Image.open(f"data/Z-Image-i2L_style_input/assets/style/1/{i}.jpg") for i in range(4)]

# Image to LoRA
with torch.no_grad():
    embs = ZImageUnit_Image2LoRAEncode().process(pipe, image2lora_images=images)
    lora = ZImageUnit_Image2LoRADecode().process(pipe, **embs)["lora"]
save_file(lora, "lora.safetensors")

# Generate images
prompt = "a cat"
negative_prompt = "泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符"
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    seed=0, cfg_scale=4, num_inference_steps=50,
    positive_only_lora=lora,
    sigma_shift=8
)
image.save("image.jpg")
```