License & Attribution

These are quantized derivative weights of Tongyi-MAI/Z-Image-Turbo (Z-Image-Turbo).

Modifications: the original weights were quantized (e.g. W4A4 / FP4 / INT4 / FP8) and repackaged for the QuantFunc inference engine — a "modification" under Apache-2.0 §4(b).
Upstream license: the base model is licensed under the Apache License 2.0, included here as LICENSE-APACHE; the upstream copyright and attribution notices are retained.
This derivative: the QuantFunc quantization & packaging are additionally provided under the QuantFunc Model License (see LICENSE).
This repository is not affiliated with or endorsed by the upstream model authors.

QuantFunc

🤗 Hugging Face | 🤖 ModelScope | 💻 GitHub | 💬 WeChat (微信) | 🎮 Discord

⚡ Z-Image-Turbo — ultra-fast text-to-image. Pre-quantized for the QuantFunc plugin: 2x–11x speedup, running from RTX 20-series up.

Pre-quantized Z-Image-Turbo (Alibaba Tongyi) for the Lighting engine — a high-speed distilled text-to-image model (~1.2s per image on RTX 4090, ~4.2x).

Powered by the QuantFunc ComfyUI plugin — the fastest diffusion inference engine:

🚀 2x–11x speedup over standard BF16/FP16 Python pipelines (pre-exported → even faster loading).
⚙️ Native C++/CUDA (libquantfunc.so / quantfunc.dll) with zero Python model dependencies.
🧩 Dual engine (SVDQ offline + Lighting runtime 4-bit), zero-cost LoRA stacking, reference-image editing & inpainting.
🟢 Full GPU coverage — RTX 20/30/40/50 · A100/H100/H200/B100/B200/GB300 · RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native FP4 on Blackwell.

👉 Install the plugin: https://github.com/RealJonathanYip/ComfyUI-QuantFunc

Z-Image-Series

Pre-quantized Z-Image-Turbo text-to-image model series by QuantFunc, with Lighting backend inference support.

Overview

Z-Image-Turbo is a high-speed text-to-image diffusion model distilled from Alibaba Tongyi team's image generation model. This repository provides the complete inference model pre-quantized and exported via QuantFunc.

With the latest QuantFunc ComfyUI plugin, inference achieves 2x–11x speedup over mainstream frameworks.

Hardware Requirements

Supports NVIDIA RTX 20 series and above
RTX 20 series does not support BF16, which causes significant precision loss in Qwen series model quantization scenarios. Therefore, the 20 series currently only supports Z-Image models.

Directory Structure

Z-Image-Series/
├── z-image-series-50x-above-base-model/     # Base model, optimized for RTX 50 series and above
│   ├── text_encoder/                        # Qwen3 text encoder (pre-quantized)
│   ├── vae/                                 # VAE decoder (~160MB)
│   ├── tokenizer/                           # Tokenizer
│   ├── scheduler/                           # Scheduler config
│   ├── model_index.json                     # Model index
│   └── quantfunc_config.json                # QuantFunc quantization config
├── z-image-series-50x-below-base-model/     # Base model, optimized for RTX 50 series and below
│   └── (same structure as above)
└── transformer/
    ├── config.json                          # Transformer architecture config
    ├── z-image-turbo-50x-above-lighting.safetensors   # RTX 50+ Lighting (~3.5GB)
    └── z-image-turbo-50x-below-lighting.safetensors   # RTX 20/30/40 Lighting (~3.3GB)

Model Variants

Variant	base-model	transformer	Total Size	Target GPU
50x-above	`z-image-series-50x-above-base-model`	`z-image-turbo-50x-above-lighting.safetensors`	~6.5GB	RTX 50 series and above
50x-below	`z-image-series-50x-below-base-model`	`z-image-turbo-50x-below-lighting.safetensors`	~6.2GB	RTX 20/30/40 + datacenter (T4/A100/H100/H200)

50x-above: Optimized for RTX 50 series (Blackwell) and above
50x-below: Optimized for RTX 20/30/40 series and all pre-Blackwell datacenter GPUs (T4, A100, H100, H200)

The base-model and transformer must use the same variant (both above or both below).

Quick Start

Download

pip install huggingface_hub

from huggingface_hub import snapshot_download
model_dir = snapshot_download('QuantFunc/Z-Image-Series')

Inference

# RTX 50 series
quantfunc \
  --model-dir Z-Image-Series/z-image-series-50x-above-base-model \
  --transformer Z-Image-Series/transformer/z-image-turbo-50x-above-lighting.safetensors \
  --auto-optimize --model-backend lighting \
  --prompt "a cute cat sitting on a windowsill watching rain" \
  --output output.png --steps 4

# RTX 20/30/40 series
quantfunc \
  --model-dir Z-Image-Series/z-image-series-50x-below-base-model \
  --transformer Z-Image-Series/transformer/z-image-turbo-50x-below-lighting.safetensors \
  --auto-optimize --model-backend lighting \
  --prompt "a cute cat sitting on a windowsill watching rain" \
  --output output.png --steps 4

--auto-optimize automatically selects the optimal VRAM management, attention backend, and quantization compression strategy based on your GPU.

SVDQ && Lighting Backend

This repository provides Lighting backend models. Differences between the two backends:

Feature	Lighting	SVDQ
Quantization	Per-layer mixed precision (FP4/INT4/FP8/INT8)	Nunchaku-based holistic pre-quantization
LoRA Integration	Real-time quantization — build a custom model in 5 minutes with zero speed loss, integrating any number of LoRAs	Runtime low-rank pathway
Ecosystem	QuantFunc native	Compatible with the widely-adopted Nunchaku ecosystem, enhanced with Rotation quantization and Auto Rank dynamic rank optimization
Flexibility	Per-layer precision control	Precision fixed at export time
Use Cases	Rapid personal model customization, batch LoRA integration	Leverage Nunchaku ecosystem, runtime dynamic LoRA

Precision Config (precision-config/)

Sample per-layer precision configurations for the Lighting backend:

File	Target GPU	Precision
`50x-above-fp4-sample.json`	RTX 50+	FP4 all layers
`50x-below-int4-sample.json`	RTX 30/40	INT4 all layers

Related Repositories

QuantFunc/Qwen-Image-Series — Qwen-Image text-to-image (60 layers)
QuantFunc/Qwen-Image-Edit-Series — Qwen-Image-Edit image editing

License

The pre-quantized model weights in this repository are derived from the original models. Users must comply with the original model's license agreement. The QuantFunc inference engine and its plugins (including the ComfyUI plugin) are licensed separately — see official QuantFunc channels for details.

For models quantized from commercially licensed models, users are responsible for obtaining the necessary commercial licenses from the original model providers.

Community

Join our community for support, updates, and discussions:

🎮 Discord server
💬 Scan the QR code below to join our WeChat group:

Downloads last month: 64

Model tree for QuantFunc/Z-Image-Series

Base model

Tongyi-MAI/Z-Image-Turbo

Quantized

(58)

this model