Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper β’ 2511.22699 β’ Published β’ 245
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("vantagewithai/Z-Image-GGUF", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]Quantized GGUF version of Z-Image.
Original model link: https://huggingface.co/Tongyi-MAI/Z-Image
Watch us at Youtube: @VantageWithAI
Z-Image is the foundation model of the β‘οΈ- Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom.
| Aspect | Z-Image | Z-Image-Turbo |
|---|---|---|
| CFG | β | β |
| Steps | 28~50 | 8 |
| Fintunablity | β | β |
| Negative Prompting | β | β |
| Diversity | High | Low |
| Visual Quality | High | Very High |
| RL | β | β |
If you find our work useful in your research, please consider citing:
@article{team2025zimage,
title={Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
author={Z-Image Team},
journal={arXiv preprint arXiv:2511.22699},
year={2025}
}
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit