Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper
•
2511.22699
•
Published
•
234
Z-Image 是一個擁有 6B 參數的高性能圖像生成模型。透過 OpenVINO™ 的優化,您可以輕鬆在個人電腦(如 Intel Core™ 處理器或 Arc™ 獨立顯卡)上流暢運行,無需依賴雲端 GPU。
Z-Image採用了 可擴展單流 DiT (S3-DiT) 架構。在此架構中,文本、視覺語義 Token 以及 VAE Token 在序列層面進行拼接,形成統一的輸入流,與傳統雙流架構相比,極大地提升了參數效率。
請執行以下指令安裝必要的相依庫與特定版本:
python.exe -m pip install --upgrade pip
pip3 uninstall -y optimum transformers optimum-intel diffusers
pip3 install git+https://github.com/huggingface/diffusers
pip3 install git+https://github.com/openvino-dev-samples/optimum-intel.git@zimage
pip3 install nncf
pip3 install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cpu
pip3 install openvino==2025.4
optimum-cli export openvino --model Tongyi-MAI/Z-Image-Turbo --task text-to-image Z-Image-Turbo-ov --weight-format int4 --group-size 64 --ratio 1.0
import os
import random
import torch
from optimum.intel import OVZImagePipeline
import openvino as ov
core = ov.Core()
# 偵測支援的設備
available_devices = core.available_devices
print(f"目前可用設備: {available_devices}")
# 1. 模型初始化
device = "GPU" if "GPU" in available_devices else "CPU"
model_id = "hsuwill000/Z-Image-Turbo-ov"
pipe = OVZImagePipeline.from_pretrained(model_id, device=device)
# 2. 設定提示詞與隨機種子
prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda, blurred colorful distant lights."
seed = random.randint(0, 2**32 - 1)
# 3. 執行推論
image = pipe(
prompt=prompt,
height=512,
width=512,
num_inference_steps=7,
guidance_scale=0.0,
generator=torch.Generator("cpu").manual_seed(seed),
).images[0]
# 4. 儲存結果
image.save("z_image_ov_output.png")
Base model
Tongyi-MAI/Z-Image-Turbo