Z-Image-i2L / README.md
kelseye's picture
upload
0f3b583 verified
---
frameworks:
- Pytorch
license: apache-2.0
tags: []
tasks:
- text-to-image-synthesis
base_model:
- Tongyi-MAI/Z-Image
base_model_relation: adapter
---
## 模型介绍
i2L (Image to LoRA) 模型是我们以疯狂的思路设计的模型结构。模型的输入为一张图片,输出为这张图片训练出的 LoRA 模型。本模型基于我们之前的 Qwen-Image-i2L([模型](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-i2L)、[技术博客](https://modelscope.cn/learn/3343)),进一步完善并迁移到 [Z-Image](https://modelscope.cn/models/Tongyi-MAI/Z-Image),着重增强了模型的风格保持能力。
为保证生成的图像质量,我们建议按以下参数使用本模型产生的 LoRA 模型:
* 使用负向提示词
* 中文:`"泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符"`
* 英文:`"Yellowed, green-tinted, blurry, low-resolution, low-quality image, distorted limbs, eerie appearance, ugly, AI-looking, noise, grid-like artifacts, JPEG compression artifacts, abnormal limbs, watermark, garbled text, meaningless characters"`
* `cfg_scale = 4`
* `sigma_shift = 8`
* 仅在正向提示词侧启用 LoRA,在负向提示词侧关闭 LoRA,这会提升图像质量
在线体验:https://modelscope.cn/studios/DiffSynth-Studio/Z-Image-i2L
## 效果展示
Z-Image-i2L 模型可用于快速生成风格 LoRA,只需输入几张风格统一的图像。以下是我们生成的结果,随机种子都是 0。
### 风格1:水彩绘画
输入图像:
|![](./assets/style/1/0.jpg)|![](./assets/style/1/1.jpg)|![](./assets/style/1/2.jpg)|![](./assets/style/1/3.jpg)|
|-|-|-|-|
生成图像:
|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/1/image_0.jpg)|![](./assets/style/1/image_1.jpg)|![](./assets/style/1/image_2.jpg)|
### 风格2:写实细节
输入图像:
|![](./assets/style/5/0.jpg)|![](./assets/style/5/1.jpg)|![](./assets/style/5/2.jpg)|![](./assets/style/5/3.jpg)|![](./assets/style/5/4.jpg)|
|-|-|-|-|-|
生成图像:
|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/5/image_0.jpg)|![](./assets/style/5/image_1.jpg)|![](./assets/style/5/image_2.jpg)|
### 风格3:缤纷色块
输入图像:
|![](./assets/style/2/0.jpg)|![](./assets/style/2/1.jpg)|![](./assets/style/2/2.jpg)|![](./assets/style/2/3.jpg)|![](./assets/style/2/4.jpg)|![](./assets/style/2/5.jpg)|
|-|-|-|-|-|-|
生成图像:
|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/2/image_0.jpg)|![](./assets/style/2/image_1.jpg)|![](./assets/style/2/image_2.jpg)|
### 风格4:鲜花少女
输入图像:
|![](./assets/style/3/0.jpg)|![](./assets/style/3/1.jpg)|![](./assets/style/3/2.jpg)|![](./assets/style/3/3.jpg)|
|-|-|-|-|
生成图像:
|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/3/image_0.jpg)|![](./assets/style/3/image_1.jpg)|![](./assets/style/3/image_2.jpg)|
### 风格5:黑白简约
输入图像:
|![](./assets/style/6/0.jpg)|![](./assets/style/6/1.jpg)|![](./assets/style/6/2.jpg)|![](./assets/style/6/3.jpg)|
|-|-|-|-|
生成图像:
|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/6/image_0.jpg)|![](./assets/style/6/image_1.jpg)|![](./assets/style/6/image_2.jpg)|
### 风格6:幻想世界
输入图像:
|![](./assets/style/4/0.jpg)|![](./assets/style/4/1.jpg)|![](./assets/style/4/2.jpg)|![](./assets/style/4/3.jpg)|![](./assets/style/4/4.jpg)|![](./assets/style/4/5.jpg)|
|-|-|-|-|-|-|
生成图像:
|a cat|a dog|a girl|
|-|-|-|
|![](./assets/style/4/image_0.jpg)|![](./assets/style/4/image_1.jpg)|![](./assets/style/4/image_2.jpg)|
## 推理代码
安装 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio):
```shell
git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
```
模型推理:
```python
from diffsynth.pipelines.z_image import (
ZImagePipeline, ModelConfig,
ZImageUnit_Image2LoRAEncode, ZImageUnit_Image2LoRADecode
)
from modelscope import snapshot_download
from safetensors.torch import save_file
import torch
from PIL import Image
# Use `vram_config` to enable LoRA hot-loading
vram_config = {
"offload_dtype": torch.bfloat16,
"offload_device": "cuda",
"onload_dtype": torch.bfloat16,
"onload_device": "cuda",
"preparing_dtype": torch.bfloat16,
"preparing_device": "cuda",
"computation_dtype": torch.bfloat16,
"computation_device": "cuda",
}
# Load models
pipe = ZImagePipeline.from_pretrained(
torch_dtype=torch.bfloat16,
device="cuda",
model_configs=[
ModelConfig(model_id="Tongyi-MAI/Z-Image", origin_file_pattern="transformer/*.safetensors", **vram_config),
ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="text_encoder/*.safetensors"),
ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="SigLIP2-G384/model.safetensors"),
ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="DINOv3-7B/model.safetensors"),
ModelConfig(model_id="DiffSynth-Studio/Z-Image-i2L", origin_file_pattern="model.safetensors"),
],
tokenizer_config=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"),
)
# Load images
snapshot_download(
model_id="DiffSynth-Studio/Z-Image-i2L",
allow_file_pattern="assets/style/*",
local_dir="data/Z-Image-i2L_style_input"
)
images = [Image.open(f"data/Z-Image-i2L_style_input/assets/style/1/{i}.jpg") for i in range(4)]
# Image to LoRA
with torch.no_grad():
embs = ZImageUnit_Image2LoRAEncode().process(pipe, image2lora_images=images)
lora = ZImageUnit_Image2LoRADecode().process(pipe, **embs)["lora"]
save_file(lora, "lora.safetensors")
# Generate images
prompt = "a cat"
negative_prompt = "泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
seed=0, cfg_scale=4, num_inference_steps=50,
positive_only_lora=lora,
sigma_shift=8
)
image.save("image.jpg")
```