--- frameworks: - Pytorch license: apache-2.0 tags: [] tasks: - text-to-image-synthesis base_model: - Tongyi-MAI/Z-Image base_model_relation: adapter --- ## 模型介绍 i2L (Image to LoRA) 模型是我们以疯狂的思路设计的模型结构。模型的输入为一张图片,输出为这张图片训练出的 LoRA 模型。本模型基于我们之前的 Qwen-Image-i2L([模型](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-i2L)、[技术博客](https://modelscope.cn/learn/3343)),进一步完善并迁移到 [Z-Image](https://modelscope.cn/models/Tongyi-MAI/Z-Image),着重增强了模型的风格保持能力。 为保证生成的图像质量,我们建议按以下参数使用本模型产生的 LoRA 模型: * 使用负向提示词 * 中文:`"泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符"` * 英文:`"Yellowed, green-tinted, blurry, low-resolution, low-quality image, distorted limbs, eerie appearance, ugly, AI-looking, noise, grid-like artifacts, JPEG compression artifacts, abnormal limbs, watermark, garbled text, meaningless characters"` * `cfg_scale = 4` * `sigma_shift = 8` * 仅在正向提示词侧启用 LoRA,在负向提示词侧关闭 LoRA,这会提升图像质量 在线体验:https://modelscope.cn/studios/DiffSynth-Studio/Z-Image-i2L ## 效果展示 Z-Image-i2L 模型可用于快速生成风格 LoRA,只需输入几张风格统一的图像。以下是我们生成的结果,随机种子都是 0。 ### 风格1:水彩绘画 输入图像: |![](./assets/style/1/0.jpg)|![](./assets/style/1/1.jpg)|![](./assets/style/1/2.jpg)|![](./assets/style/1/3.jpg)| |-|-|-|-| 生成图像: |a cat|a dog|a girl| |-|-|-| |![](./assets/style/1/image_0.jpg)|![](./assets/style/1/image_1.jpg)|![](./assets/style/1/image_2.jpg)| ### 风格2:写实细节 输入图像: |![](./assets/style/5/0.jpg)|![](./assets/style/5/1.jpg)|![](./assets/style/5/2.jpg)|![](./assets/style/5/3.jpg)|![](./assets/style/5/4.jpg)| |-|-|-|-|-| 生成图像: |a cat|a dog|a girl| |-|-|-| |![](./assets/style/5/image_0.jpg)|![](./assets/style/5/image_1.jpg)|![](./assets/style/5/image_2.jpg)| ### 风格3:缤纷色块 输入图像: |![](./assets/style/2/0.jpg)|![](./assets/style/2/1.jpg)|![](./assets/style/2/2.jpg)|![](./assets/style/2/3.jpg)|![](./assets/style/2/4.jpg)|![](./assets/style/2/5.jpg)| |-|-|-|-|-|-| 生成图像: |a cat|a dog|a girl| |-|-|-| |![](./assets/style/2/image_0.jpg)|![](./assets/style/2/image_1.jpg)|![](./assets/style/2/image_2.jpg)| ### 风格4:鲜花少女 输入图像: |![](./assets/style/3/0.jpg)|![](./assets/style/3/1.jpg)|![](./assets/style/3/2.jpg)|![](./assets/style/3/3.jpg)| |-|-|-|-| 生成图像: |a cat|a dog|a girl| |-|-|-| |![](./assets/style/3/image_0.jpg)|![](./assets/style/3/image_1.jpg)|![](./assets/style/3/image_2.jpg)| ### 风格5:黑白简约 输入图像: |![](./assets/style/6/0.jpg)|![](./assets/style/6/1.jpg)|![](./assets/style/6/2.jpg)|![](./assets/style/6/3.jpg)| |-|-|-|-| 生成图像: |a cat|a dog|a girl| |-|-|-| |![](./assets/style/6/image_0.jpg)|![](./assets/style/6/image_1.jpg)|![](./assets/style/6/image_2.jpg)| ### 风格6:幻想世界 输入图像: |![](./assets/style/4/0.jpg)|![](./assets/style/4/1.jpg)|![](./assets/style/4/2.jpg)|![](./assets/style/4/3.jpg)|![](./assets/style/4/4.jpg)|![](./assets/style/4/5.jpg)| |-|-|-|-|-|-| 生成图像: |a cat|a dog|a girl| |-|-|-| |![](./assets/style/4/image_0.jpg)|![](./assets/style/4/image_1.jpg)|![](./assets/style/4/image_2.jpg)| ## 推理代码 安装 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio): ```shell git clone https://github.com/modelscope/DiffSynth-Studio.git cd DiffSynth-Studio pip install -e . ``` 模型推理: ```python from diffsynth.pipelines.z_image import ( ZImagePipeline, ModelConfig, ZImageUnit_Image2LoRAEncode, ZImageUnit_Image2LoRADecode ) from modelscope import snapshot_download from safetensors.torch import save_file import torch from PIL import Image # Use `vram_config` to enable LoRA hot-loading vram_config = { "offload_dtype": torch.bfloat16, "offload_device": "cuda", "onload_dtype": torch.bfloat16, "onload_device": "cuda", "preparing_dtype": torch.bfloat16, "preparing_device": "cuda", "computation_dtype": torch.bfloat16, "computation_device": "cuda", } # Load models pipe = ZImagePipeline.from_pretrained( torch_dtype=torch.bfloat16, device="cuda", model_configs=[ ModelConfig(model_id="Tongyi-MAI/Z-Image", origin_file_pattern="transformer/*.safetensors", **vram_config), ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="text_encoder/*.safetensors"), ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="SigLIP2-G384/model.safetensors"), ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="DINOv3-7B/model.safetensors"), ModelConfig(model_id="DiffSynth-Studio/Z-Image-i2L", origin_file_pattern="model.safetensors"), ], tokenizer_config=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"), ) # Load images snapshot_download( model_id="DiffSynth-Studio/Z-Image-i2L", allow_file_pattern="assets/style/*", local_dir="data/Z-Image-i2L_style_input" ) images = [Image.open(f"data/Z-Image-i2L_style_input/assets/style/1/{i}.jpg") for i in range(4)] # Image to LoRA with torch.no_grad(): embs = ZImageUnit_Image2LoRAEncode().process(pipe, image2lora_images=images) lora = ZImageUnit_Image2LoRADecode().process(pipe, **embs)["lora"] save_file(lora, "lora.safetensors") # Generate images prompt = "a cat" negative_prompt = "泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符" image = pipe( prompt=prompt, negative_prompt=negative_prompt, seed=0, cfg_scale=4, num_inference_steps=50, positive_only_lora=lora, sigma_shift=8 ) image.save("image.jpg") ```