|
|
--- |
|
|
|
|
|
frameworks: |
|
|
- Pytorch |
|
|
license: apache-2.0 |
|
|
tags: [] |
|
|
tasks: |
|
|
- text-to-image-synthesis |
|
|
base_model: |
|
|
- Tongyi-MAI/Z-Image |
|
|
base_model_relation: adapter |
|
|
--- |
|
|
## 模型介绍 |
|
|
|
|
|
i2L (Image to LoRA) 模型是我们以疯狂的思路设计的模型结构。模型的输入为一张图片,输出为这张图片训练出的 LoRA 模型。本模型基于我们之前的 Qwen-Image-i2L([模型](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-i2L)、[技术博客](https://modelscope.cn/learn/3343)),进一步完善并迁移到 [Z-Image](https://modelscope.cn/models/Tongyi-MAI/Z-Image),着重增强了模型的风格保持能力。 |
|
|
|
|
|
为保证生成的图像质量,我们建议按以下参数使用本模型产生的 LoRA 模型: |
|
|
|
|
|
* 使用负向提示词 |
|
|
* 中文:`"泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符"` |
|
|
* 英文:`"Yellowed, green-tinted, blurry, low-resolution, low-quality image, distorted limbs, eerie appearance, ugly, AI-looking, noise, grid-like artifacts, JPEG compression artifacts, abnormal limbs, watermark, garbled text, meaningless characters"` |
|
|
* `cfg_scale = 4` |
|
|
* `sigma_shift = 8` |
|
|
* 仅在正向提示词侧启用 LoRA,在负向提示词侧关闭 LoRA,这会提升图像质量 |
|
|
|
|
|
在线体验:https://modelscope.cn/studios/DiffSynth-Studio/Z-Image-i2L |
|
|
|
|
|
## 效果展示 |
|
|
|
|
|
Z-Image-i2L 模型可用于快速生成风格 LoRA,只需输入几张风格统一的图像。以下是我们生成的结果,随机种子都是 0。 |
|
|
|
|
|
### 风格1:水彩绘画 |
|
|
|
|
|
输入图像: |
|
|
|
|
|
||||| |
|
|
|-|-|-|-| |
|
|
|
|
|
生成图像: |
|
|
|
|
|
|a cat|a dog|a girl| |
|
|
|-|-|-| |
|
|
|||| |
|
|
|
|
|
### 风格2:写实细节 |
|
|
|
|
|
输入图像: |
|
|
|
|
|
|||||| |
|
|
|-|-|-|-|-| |
|
|
|
|
|
生成图像: |
|
|
|
|
|
|a cat|a dog|a girl| |
|
|
|-|-|-| |
|
|
|||| |
|
|
|
|
|
### 风格3:缤纷色块 |
|
|
|
|
|
输入图像: |
|
|
|
|
|
||||||| |
|
|
|-|-|-|-|-|-| |
|
|
|
|
|
生成图像: |
|
|
|
|
|
|a cat|a dog|a girl| |
|
|
|-|-|-| |
|
|
|||| |
|
|
|
|
|
### 风格4:鲜花少女 |
|
|
|
|
|
输入图像: |
|
|
|
|
|
||||| |
|
|
|-|-|-|-| |
|
|
|
|
|
生成图像: |
|
|
|
|
|
|a cat|a dog|a girl| |
|
|
|-|-|-| |
|
|
|||| |
|
|
|
|
|
### 风格5:黑白简约 |
|
|
|
|
|
输入图像: |
|
|
|
|
|
||||| |
|
|
|-|-|-|-| |
|
|
|
|
|
生成图像: |
|
|
|
|
|
|a cat|a dog|a girl| |
|
|
|-|-|-| |
|
|
|||| |
|
|
|
|
|
### 风格6:幻想世界 |
|
|
|
|
|
输入图像: |
|
|
|
|
|
||||||| |
|
|
|-|-|-|-|-|-| |
|
|
|
|
|
生成图像: |
|
|
|
|
|
|a cat|a dog|a girl| |
|
|
|-|-|-| |
|
|
|||| |
|
|
|
|
|
## 推理代码 |
|
|
|
|
|
安装 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio): |
|
|
|
|
|
```shell |
|
|
git clone https://github.com/modelscope/DiffSynth-Studio.git |
|
|
cd DiffSynth-Studio |
|
|
pip install -e . |
|
|
``` |
|
|
|
|
|
模型推理: |
|
|
|
|
|
```python |
|
|
from diffsynth.pipelines.z_image import ( |
|
|
ZImagePipeline, ModelConfig, |
|
|
ZImageUnit_Image2LoRAEncode, ZImageUnit_Image2LoRADecode |
|
|
) |
|
|
from modelscope import snapshot_download |
|
|
from safetensors.torch import save_file |
|
|
import torch |
|
|
from PIL import Image |
|
|
|
|
|
# Use `vram_config` to enable LoRA hot-loading |
|
|
vram_config = { |
|
|
"offload_dtype": torch.bfloat16, |
|
|
"offload_device": "cuda", |
|
|
"onload_dtype": torch.bfloat16, |
|
|
"onload_device": "cuda", |
|
|
"preparing_dtype": torch.bfloat16, |
|
|
"preparing_device": "cuda", |
|
|
"computation_dtype": torch.bfloat16, |
|
|
"computation_device": "cuda", |
|
|
} |
|
|
|
|
|
# Load models |
|
|
pipe = ZImagePipeline.from_pretrained( |
|
|
torch_dtype=torch.bfloat16, |
|
|
device="cuda", |
|
|
model_configs=[ |
|
|
ModelConfig(model_id="Tongyi-MAI/Z-Image", origin_file_pattern="transformer/*.safetensors", **vram_config), |
|
|
ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="text_encoder/*.safetensors"), |
|
|
ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), |
|
|
ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="SigLIP2-G384/model.safetensors"), |
|
|
ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="DINOv3-7B/model.safetensors"), |
|
|
ModelConfig(model_id="DiffSynth-Studio/Z-Image-i2L", origin_file_pattern="model.safetensors"), |
|
|
], |
|
|
tokenizer_config=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"), |
|
|
) |
|
|
|
|
|
# Load images |
|
|
snapshot_download( |
|
|
model_id="DiffSynth-Studio/Z-Image-i2L", |
|
|
allow_file_pattern="assets/style/*", |
|
|
local_dir="data/Z-Image-i2L_style_input" |
|
|
) |
|
|
images = [Image.open(f"data/Z-Image-i2L_style_input/assets/style/1/{i}.jpg") for i in range(4)] |
|
|
|
|
|
# Image to LoRA |
|
|
with torch.no_grad(): |
|
|
embs = ZImageUnit_Image2LoRAEncode().process(pipe, image2lora_images=images) |
|
|
lora = ZImageUnit_Image2LoRADecode().process(pipe, **embs)["lora"] |
|
|
save_file(lora, "lora.safetensors") |
|
|
|
|
|
# Generate images |
|
|
prompt = "a cat" |
|
|
negative_prompt = "泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符" |
|
|
image = pipe( |
|
|
prompt=prompt, |
|
|
negative_prompt=negative_prompt, |
|
|
seed=0, cfg_scale=4, num_inference_steps=50, |
|
|
positive_only_lora=lora, |
|
|
sigma_shift=8 |
|
|
) |
|
|
image.save("image.jpg") |
|
|
``` |
|
|
|
|
|
|