zimageturbo

by juniperi - opened Nov 26, 2025

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+22

-44

Files changed (4) hide show

.gitattributes +0 -1
README.md +20 -38
assets/leaderboard.png +0 -3
assets/showcase_realistic.png +2 -2

.gitattributes CHANGED Viewed

@@ -43,4 +43,3 @@ assets/showcase_editing.png filter=lfs diff=lfs merge=lfs -text
 assets/showcase_realistic.png filter=lfs diff=lfs merge=lfs -text
 assets/showcase_rendering.png filter=lfs diff=lfs merge=lfs -text
 assets/Z-Image-Gallery.pdf filter=lfs diff=lfs merge=lfs -text
-assets/leaderboard.png filter=lfs diff=lfs merge=lfs -text

 assets/showcase_realistic.png filter=lfs diff=lfs merge=lfs -text
 assets/showcase_rendering.png filter=lfs diff=lfs merge=lfs -text
 assets/Z-Image-Gallery.pdf filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -11,16 +11,15 @@ library_name: diffusers
 <div align="center">
-[![Official Site](https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage)](https://tongyi-mai.github.io/Z-Image-blog/)&#160;
 [![GitHub](https://img.shields.io/badge/GitHub-Z--Image-181717?logo=github&logoColor=white)](https://github.com/Tongyi-MAI/Z-Image)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-Z--Image--Turbo-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Online_Demo-Z--Image--Turbo-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo)&#160;
-[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Mobile_Demo-Z--Image--Turbo-red)](https://huggingface.co/spaces/akhaliq/Z-Image-Turbo)&#160;
 [![ModelScope Model](https://img.shields.io/badge/🤖%20Checkpoint-Z--Image--Turbo-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo)&#160;
-[![ModelScope Space](https://img.shields.io/badge/🤖%20Online_Demo-Z--Image--Turbo-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster)&#160;
 [![Art Gallery PDF](https://img.shields.io/badge/%F0%9F%96%BC%20Art_Gallery-PDF-ff69b4)](assets/Z-Image-Gallery.pdf)&#160;
 [![Web Art Gallery](https://img.shields.io/badge/%F0%9F%8C%90%20Web_Art_Gallery-online-00bfff)](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary)&#160;
-<a href="https://arxiv.org/abs/2511.22699" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>
 Welcome to the official repository for the Z-Image（造相）project!
@@ -31,24 +30,21 @@ Welcome to the official repository for the Z-Image（造相）project!
 ## ✨ Z-Image
-Z-Image is a powerful and highly efficient image generation model family with **6B** parameters. Currently there are four variants:
 - 🚀 **Z-Image-Turbo** – A distilled version of Z-Image that matches or exceeds leading competitors with only **8 NFEs** (Number of Function Evaluations). It offers **⚡️sub-second inference latency⚡️** on enterprise-grade H800 GPUs and fits comfortably within **16G VRAM consumer devices**. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
-- 🎨 **Z-Image** – The foundation model behind Z-Image-Turbo. Z-Image focuses on **high-quality generation**, **rich aesthetics**, **strong diversity**, and **controllability**, well-suited for creative generation, **fine-tuning**, and downstream development. It supports a wide range of artistic styles, effective negative prompting, and high diversity across identities, poses, compositions, and layouts.
-- 🧱 **Z-Image-Omni-Base** – The versatile foundation model capable of both **generation and editing tasks**. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development, providing the most "raw" and diverse starting point for the open-source community.
 - ✍️ **Z-Image-Edit** – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.
 ### 📥 Model Zoo
-| Model | Pre-Training | SFT | RL | Step | CFG | Task | Visual Quality | Diversity | Fine-Tunability | Hugging Face | ModelScope |
-| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
-| **Z-Image-Omni-Base** | ✅ | ❌ | ❌ | 50 | ✅ | Gen. / Editing | Medium | High | Easy | *To be released* | *To be released* |
-| **Z-Image** | ✅ | ✅ | ❌ | 50 | ✅ | Gen. | High | Medium | Easy | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint%20-Z--Image-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image) <br> [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Demo-Z--Image-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image) | [![ModelScope Model](https://img.shields.io/badge/🤖%20%20Checkpoint-Z--Image-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image) <br> [![ModelScope Space](https://img.shields.io/badge/%F0%9F%A4%96%20Demo-Z--Image-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=569345&modelType=Checkpoint&sdVersion=Z_IMAGE&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image%3Frevision%3Dmaster) |
-| **Z-Image-Turbo** | ✅ | ✅ | ✅ | 8 | ❌ | Gen. | Very High | Low | N/A | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint%20-Z--Image--Turbo-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) <br> [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Demo-Z--Image--Turbo-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo) | [![ModelScope Model](https://img.shields.io/badge/🤖%20%20Checkpoint-Z--Image--Turbo-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo) <br> [![ModelScope Space](https://img.shields.io/badge/%F0%9F%A4%96%20Demo-Z--Image--Turbo-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster) |
-| **Z-Image-Edit** | ✅ | ✅ | ❌ | 50 | ✅ | Editing | High | Medium | Easy | *To be released* | *To be released* |                                                                                                                                                                                                                                                                                           | *To be released*                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 ### 🖼️ Showcase
@@ -74,11 +70,11 @@ We adopt a **Scalable Single-Stream DiT** (S3-DiT) architecture. In this setup,
 ![Architecture of Z-Image and Z-Image-Edit](assets/architecture.webp)
 ### 📈 Performance
-According to the Elo-based Human Preference Evaluation (on [*Alibaba AI Arena*](https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I)), Z-Image-Turbo shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models.
 <p align="center">
   <a href="https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I">
-    <img src="assets/leaderboard.png" alt="Z-Image Elo Rating on AI Arena"/><br />
     <span style="font-size:1.05em; cursor:pointer; text-decoration:underline;"> Click to view the full leaderboard</span>
   </a>
 </p>
@@ -88,7 +84,7 @@ Install the latest version of diffusers, use the following command:
 <details>
   <summary><sup>Click here for details for why you need to install diffusers from source</sup></summary>
-  We have submitted two pull requests ([#12703](https://github.com/huggingface/diffusers/pull/12703) and [#12715](https://github.com/huggingface/diffusers/pull/12715)) to the 🤗 diffusers repository to add support for Z-Image. Both PRs have been merged into the latest official diffusers release.
   Therefore, you need to install diffusers from source for the latest features and Z-Image support.
 </details>
@@ -99,7 +95,7 @@ pip install git+https://github.com/huggingface/diffusers
 ```python
 import torch
-from diffusers import ZImagePipeline
 # 1. Load the pipeline
 # Use bfloat16 for optimal performance on supported GPUs
@@ -140,8 +136,6 @@ image.save("example.png")
 ## 🔬 Decoupled-DMD: The Acceleration Magic Behind Z-Image
-[![arXiv](https://img.shields.io/badge/arXiv-2511.22677-b31b1b.svg)](https://arxiv.org/abs/2511.22677)
 Decoupled-DMD is the core few-step distillation algorithm that empowers the 8-step Z-Image model.
 Our core insight in Decoupled-DMD  is that the success of existing DMD (Distributaion Matching Distillation) methods is the result of two independent, collaborating mechanisms:
@@ -177,24 +171,12 @@ HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image-Turbo
 If you find our work useful in your research, please consider citing:
 ```bibtex
-@article{team2025zimage,
   title={Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
-  author={Z-Image Team},
-  journal={arXiv preprint arXiv:2511.22699},
-  year={2025}
-}
-@article{liu2025decoupled,
-  title={Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield},
-  author={Dongyang Liu and Peng Gao and David Liu and Ruoyi Du and Zhen Li and Qilong Wu and Xin Jin and Sihan Cao and Shifeng Zhang and Hongsheng Li and Steven Hoi},
-  journal={arXiv preprint arXiv:2511.22677},
-  year={2025}
-}
-@article{jiang2025distribution,
-  title={Distribution Matching Distillation Meets Reinforcement Learning},
-  author={Jiang, Dengyang and Liu, Dongyang and Wang, Zanyi and Wu, Qilong and Jin, Xin and Liu, David and Li, Zhen and Wang, Mengmeng and Gao, Peng and Yang, Harry},
-  journal={arXiv preprint arXiv:2511.13649},
-  year={2025}
 }
 ```

 <div align="center">
+[![Official Site](https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage)](https://tongyi-mai.github.io/Z-Image-homepage/)&#160;
 [![GitHub](https://img.shields.io/badge/GitHub-Z--Image-181717?logo=github&logoColor=white)](https://github.com/Tongyi-MAI/Z-Image)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-Z--Image--Turbo-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Online_Demo-Z--Image--Turbo-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo)&#160;
 [![ModelScope Model](https://img.shields.io/badge/🤖%20Checkpoint-Z--Image--Turbo-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo)&#160;
+[![ModelScope Space](https://img.shields.io/badge/🤖%20Online_Demo-Z--Image--Turbo-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%253A%252F%252FTongyi-MAI%252FZ-Image-Turbo%253Frevision%253Dmaster%7D%7BOnline)&#160;
 [![Art Gallery PDF](https://img.shields.io/badge/%F0%9F%96%BC%20Art_Gallery-PDF-ff69b4)](assets/Z-Image-Gallery.pdf)&#160;
 [![Web Art Gallery](https://img.shields.io/badge/%F0%9F%8C%90%20Web_Art_Gallery-online-00bfff)](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary)&#160;
+<a href="http://github.com/Tongyi-MAI/Z-Image/blob/main/Z_Image_Report.pdf" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>
 Welcome to the official repository for the Z-Image（造相）project!
 ## ✨ Z-Image
+Z-Image is a powerful and highly efficient image generation model with **6B** parameters. It is currently has three variants:
 - 🚀 **Z-Image-Turbo** – A distilled version of Z-Image that matches or exceeds leading competitors with only **8 NFEs** (Number of Function Evaluations). It offers **⚡️sub-second inference latency⚡️** on enterprise-grade H800 GPUs and fits comfortably within **16G VRAM consumer devices**. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
+- 🧱 **Z-Image-Base** – The non-distilled foundation model. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development.
 - ✍️ **Z-Image-Edit** – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.
 ### 📥 Model Zoo
+| Model | Hugging Face                                                                                                                                                                                                                                                                                                              | ModelScope                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| :--- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **Z-Image-Turbo** | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint%20-Z--Image--Turbo-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) <br> [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Online%20Demo-Z--Image--Turbo-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo) | [![ModelScope Model](https://img.shields.io/badge/🤖%20%20Checkpoint-Z--Image--Turbo-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo) <br> [![ModelScope Space](https://img.shields.io/badge/%F0%9F%A4%96%20Online%20Demo-Z--Image--Turbo-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster) |
+| **Z-Image-Base** | *To be released*                                                                                                                                                                                                                                                                                                          | *To be released*                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| **Z-Image-Edit** | *To be released*                                                                                                                                                                                                                                                                                                          | *To be released*                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 ### 🖼️ Showcase
 ![Architecture of Z-Image and Z-Image-Edit](assets/architecture.webp)
 ### 📈 Performance
+According to the Elo-based Human Preference Evaluation (on [AI Arena](https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I)), Z-Image-Turbo shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models.
 <p align="center">
   <a href="https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I">
+    <img src="assets/leaderboard.webp" alt="Z-Image Elo Rating on AI Arena"/><br />
     <span style="font-size:1.05em; cursor:pointer; text-decoration:underline;"> Click to view the full leaderboard</span>
   </a>
 </p>
 <details>
   <summary><sup>Click here for details for why you need to install diffusers from source</sup></summary>
+  We have submitted two pull requests ([#12703](https://github.com/huggingface/diffusers/pull/12703) and [#12715](https://github.com/huggingface/diffusers/pull/12704)) to the 🤗 diffusers repository to add support for Z-Image. Both PRs have been merged into the latest official diffusers release.
   Therefore, you need to install diffusers from source for the latest features and Z-Image support.
 </details>
 ```python
 import torch
+from diffusers import ZImagePipeline,
 # 1. Load the pipeline
 # Use bfloat16 for optimal performance on supported GPUs
 ## 🔬 Decoupled-DMD: The Acceleration Magic Behind Z-Image
 Decoupled-DMD is the core few-step distillation algorithm that empowers the 8-step Z-Image model.
 Our core insight in Decoupled-DMD  is that the success of existing DMD (Distributaion Matching Distillation) methods is the result of two independent, collaborating mechanisms:
 If you find our work useful in your research, please consider citing:
 ```bibtex
+@misc{z-image-2025,
   title={Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
+  author={Tongyi Lab},
+  year={2025},
+  publisher={GitHub},
+  journal={GitHub repository},
+  howpublished={\url{https://github.com/Tongyi-MAI/Z-Image}}
 }
 ```

assets/leaderboard.png DELETED Viewed

Git LFS Details

SHA256: e9fd4aa185bb7bff2b5515f2001b4d80df330595e78d6a098142e5a232bb4e4e
Pointer size: 132 Bytes
Size of remote file: 2.03 MB

assets/showcase_realistic.png CHANGED Viewed

Git LFS Details

SHA256: 697e6f6857f619314173508df72a14314cbb43e67475de7494123bb8b4f4eb2c
Pointer size: 132 Bytes
Size of remote file: 6.26 MB

Git LFS Details

SHA256: 9a739bf5b0d1055e8fbe073b560fade2cc7bbcf4a0c8e90daf039cea051bb84b
Pointer size: 132 Bytes
Size of remote file: 8.3 MB