zimageturbo
#3
by
juniperi - opened
- .gitattributes +0 -1
- README.md +20 -38
- assets/leaderboard.png +0 -3
- assets/showcase_realistic.png +2 -2
.gitattributes
CHANGED
|
@@ -43,4 +43,3 @@ assets/showcase_editing.png filter=lfs diff=lfs merge=lfs -text
|
|
| 43 |
assets/showcase_realistic.png filter=lfs diff=lfs merge=lfs -text
|
| 44 |
assets/showcase_rendering.png filter=lfs diff=lfs merge=lfs -text
|
| 45 |
assets/Z-Image-Gallery.pdf filter=lfs diff=lfs merge=lfs -text
|
| 46 |
-
assets/leaderboard.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 43 |
assets/showcase_realistic.png filter=lfs diff=lfs merge=lfs -text
|
| 44 |
assets/showcase_rendering.png filter=lfs diff=lfs merge=lfs -text
|
| 45 |
assets/Z-Image-Gallery.pdf filter=lfs diff=lfs merge=lfs -text
|
|
|
README.md
CHANGED
|
@@ -11,16 +11,15 @@ library_name: diffusers
|
|
| 11 |
|
| 12 |
<div align="center">
|
| 13 |
|
| 14 |
-
[](https://tongyi-mai.github.io/Z-Image-
|
| 15 |
[](https://github.com/Tongyi-MAI/Z-Image) 
|
| 16 |
[](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) 
|
| 17 |
[](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo) 
|
| 18 |
-
[](https://huggingface.co/spaces/akhaliq/Z-Image-Turbo) 
|
| 19 |
[](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo) 
|
| 20 |
-
[](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%
|
| 21 |
[](assets/Z-Image-Gallery.pdf) 
|
| 22 |
[](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary) 
|
| 23 |
-
<a href="
|
| 24 |
|
| 25 |
|
| 26 |
Welcome to the official repository for the Z-Image(造相)project!
|
|
@@ -31,24 +30,21 @@ Welcome to the official repository for the Z-Image(造相)project!
|
|
| 31 |
|
| 32 |
## ✨ Z-Image
|
| 33 |
|
| 34 |
-
Z-Image is a powerful and highly efficient image generation model
|
| 35 |
|
| 36 |
- 🚀 **Z-Image-Turbo** – A distilled version of Z-Image that matches or exceeds leading competitors with only **8 NFEs** (Number of Function Evaluations). It offers **⚡️sub-second inference latency⚡️** on enterprise-grade H800 GPUs and fits comfortably within **16G VRAM consumer devices**. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
|
| 37 |
|
| 38 |
-
-
|
| 39 |
-
|
| 40 |
-
- 🧱 **Z-Image-Omni-Base** – The versatile foundation model capable of both **generation and editing tasks**. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development, providing the most "raw" and diverse starting point for the open-source community.
|
| 41 |
|
| 42 |
- ✍️ **Z-Image-Edit** – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.
|
| 43 |
|
| 44 |
### 📥 Model Zoo
|
| 45 |
|
| 46 |
-
| Model |
|
| 47 |
-
| :--- |
|
| 48 |
-
| **Z-Image-
|
| 49 |
-
| **Z-Image** |
|
| 50 |
-
| **Z-Image-
|
| 51 |
-
| **Z-Image-Edit** | ✅ | ✅ | ❌ | 50 | ✅ | Editing | High | Medium | Easy | *To be released* | *To be released* | | *To be released* |
|
| 52 |
|
| 53 |
### 🖼️ Showcase
|
| 54 |
|
|
@@ -74,11 +70,11 @@ We adopt a **Scalable Single-Stream DiT** (S3-DiT) architecture. In this setup,
|
|
| 74 |

|
| 75 |
|
| 76 |
### 📈 Performance
|
| 77 |
-
According to the Elo-based Human Preference Evaluation (on [
|
| 78 |
|
| 79 |
<p align="center">
|
| 80 |
<a href="https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I">
|
| 81 |
-
<img src="assets/leaderboard.
|
| 82 |
<span style="font-size:1.05em; cursor:pointer; text-decoration:underline;"> Click to view the full leaderboard</span>
|
| 83 |
</a>
|
| 84 |
</p>
|
|
@@ -88,7 +84,7 @@ Install the latest version of diffusers, use the following command:
|
|
| 88 |
<details>
|
| 89 |
<summary><sup>Click here for details for why you need to install diffusers from source</sup></summary>
|
| 90 |
|
| 91 |
-
We have submitted two pull requests ([#12703](https://github.com/huggingface/diffusers/pull/12703) and [#12715](https://github.com/huggingface/diffusers/pull/
|
| 92 |
Therefore, you need to install diffusers from source for the latest features and Z-Image support.
|
| 93 |
|
| 94 |
</details>
|
|
@@ -99,7 +95,7 @@ pip install git+https://github.com/huggingface/diffusers
|
|
| 99 |
|
| 100 |
```python
|
| 101 |
import torch
|
| 102 |
-
from diffusers import ZImagePipeline
|
| 103 |
|
| 104 |
# 1. Load the pipeline
|
| 105 |
# Use bfloat16 for optimal performance on supported GPUs
|
|
@@ -140,8 +136,6 @@ image.save("example.png")
|
|
| 140 |
|
| 141 |
## 🔬 Decoupled-DMD: The Acceleration Magic Behind Z-Image
|
| 142 |
|
| 143 |
-
[](https://arxiv.org/abs/2511.22677)
|
| 144 |
-
|
| 145 |
Decoupled-DMD is the core few-step distillation algorithm that empowers the 8-step Z-Image model.
|
| 146 |
|
| 147 |
Our core insight in Decoupled-DMD is that the success of existing DMD (Distributaion Matching Distillation) methods is the result of two independent, collaborating mechanisms:
|
|
@@ -177,24 +171,12 @@ HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image-Turbo
|
|
| 177 |
If you find our work useful in your research, please consider citing:
|
| 178 |
|
| 179 |
```bibtex
|
| 180 |
-
@
|
| 181 |
title={Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
|
| 182 |
-
author={
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
}
|
| 186 |
-
|
| 187 |
-
@article{liu2025decoupled,
|
| 188 |
-
title={Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield},
|
| 189 |
-
author={Dongyang Liu and Peng Gao and David Liu and Ruoyi Du and Zhen Li and Qilong Wu and Xin Jin and Sihan Cao and Shifeng Zhang and Hongsheng Li and Steven Hoi},
|
| 190 |
-
journal={arXiv preprint arXiv:2511.22677},
|
| 191 |
-
year={2025}
|
| 192 |
-
}
|
| 193 |
-
|
| 194 |
-
@article{jiang2025distribution,
|
| 195 |
-
title={Distribution Matching Distillation Meets Reinforcement Learning},
|
| 196 |
-
author={Jiang, Dengyang and Liu, Dongyang and Wang, Zanyi and Wu, Qilong and Jin, Xin and Liu, David and Li, Zhen and Wang, Mengmeng and Gao, Peng and Yang, Harry},
|
| 197 |
-
journal={arXiv preprint arXiv:2511.13649},
|
| 198 |
-
year={2025}
|
| 199 |
}
|
| 200 |
```
|
|
|
|
| 11 |
|
| 12 |
<div align="center">
|
| 13 |
|
| 14 |
+
[](https://tongyi-mai.github.io/Z-Image-homepage/) 
|
| 15 |
[](https://github.com/Tongyi-MAI/Z-Image) 
|
| 16 |
[](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) 
|
| 17 |
[](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo) 
|
|
|
|
| 18 |
[](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo) 
|
| 19 |
+
[](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%253A%252F%252FTongyi-MAI%252FZ-Image-Turbo%253Frevision%253Dmaster%7D%7BOnline) 
|
| 20 |
[](assets/Z-Image-Gallery.pdf) 
|
| 21 |
[](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary) 
|
| 22 |
+
<a href="http://github.com/Tongyi-MAI/Z-Image/blob/main/Z_Image_Report.pdf" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>
|
| 23 |
|
| 24 |
|
| 25 |
Welcome to the official repository for the Z-Image(造相)project!
|
|
|
|
| 30 |
|
| 31 |
## ✨ Z-Image
|
| 32 |
|
| 33 |
+
Z-Image is a powerful and highly efficient image generation model with **6B** parameters. It is currently has three variants:
|
| 34 |
|
| 35 |
- 🚀 **Z-Image-Turbo** – A distilled version of Z-Image that matches or exceeds leading competitors with only **8 NFEs** (Number of Function Evaluations). It offers **⚡️sub-second inference latency⚡️** on enterprise-grade H800 GPUs and fits comfortably within **16G VRAM consumer devices**. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
|
| 36 |
|
| 37 |
+
- 🧱 **Z-Image-Base** – The non-distilled foundation model. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development.
|
|
|
|
|
|
|
| 38 |
|
| 39 |
- ✍️ **Z-Image-Edit** – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.
|
| 40 |
|
| 41 |
### 📥 Model Zoo
|
| 42 |
|
| 43 |
+
| Model | Hugging Face | ModelScope |
|
| 44 |
+
| :--- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
| 45 |
+
| **Z-Image-Turbo** | [](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) <br> [](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo) | [](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo) <br> [](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster) |
|
| 46 |
+
| **Z-Image-Base** | *To be released* | *To be released* |
|
| 47 |
+
| **Z-Image-Edit** | *To be released* | *To be released* |
|
|
|
|
| 48 |
|
| 49 |
### 🖼️ Showcase
|
| 50 |
|
|
|
|
| 70 |

|
| 71 |
|
| 72 |
### 📈 Performance
|
| 73 |
+
According to the Elo-based Human Preference Evaluation (on [AI Arena](https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I)), Z-Image-Turbo shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models.
|
| 74 |
|
| 75 |
<p align="center">
|
| 76 |
<a href="https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I">
|
| 77 |
+
<img src="assets/leaderboard.webp" alt="Z-Image Elo Rating on AI Arena"/><br />
|
| 78 |
<span style="font-size:1.05em; cursor:pointer; text-decoration:underline;"> Click to view the full leaderboard</span>
|
| 79 |
</a>
|
| 80 |
</p>
|
|
|
|
| 84 |
<details>
|
| 85 |
<summary><sup>Click here for details for why you need to install diffusers from source</sup></summary>
|
| 86 |
|
| 87 |
+
We have submitted two pull requests ([#12703](https://github.com/huggingface/diffusers/pull/12703) and [#12715](https://github.com/huggingface/diffusers/pull/12704)) to the 🤗 diffusers repository to add support for Z-Image. Both PRs have been merged into the latest official diffusers release.
|
| 88 |
Therefore, you need to install diffusers from source for the latest features and Z-Image support.
|
| 89 |
|
| 90 |
</details>
|
|
|
|
| 95 |
|
| 96 |
```python
|
| 97 |
import torch
|
| 98 |
+
from diffusers import ZImagePipeline,
|
| 99 |
|
| 100 |
# 1. Load the pipeline
|
| 101 |
# Use bfloat16 for optimal performance on supported GPUs
|
|
|
|
| 136 |
|
| 137 |
## 🔬 Decoupled-DMD: The Acceleration Magic Behind Z-Image
|
| 138 |
|
|
|
|
|
|
|
| 139 |
Decoupled-DMD is the core few-step distillation algorithm that empowers the 8-step Z-Image model.
|
| 140 |
|
| 141 |
Our core insight in Decoupled-DMD is that the success of existing DMD (Distributaion Matching Distillation) methods is the result of two independent, collaborating mechanisms:
|
|
|
|
| 171 |
If you find our work useful in your research, please consider citing:
|
| 172 |
|
| 173 |
```bibtex
|
| 174 |
+
@misc{z-image-2025,
|
| 175 |
title={Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
|
| 176 |
+
author={Tongyi Lab},
|
| 177 |
+
year={2025},
|
| 178 |
+
publisher={GitHub},
|
| 179 |
+
journal={GitHub repository},
|
| 180 |
+
howpublished={\url{https://github.com/Tongyi-MAI/Z-Image}}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 181 |
}
|
| 182 |
```
|
assets/leaderboard.png
DELETED
Git LFS Details
|
assets/showcase_realistic.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|