QuantFunc
Z-Image-Series
Pre-quantized Z-Image-Turbo text-to-image model series by QuantFunc, with Lighting backend inference support.
Overview
Z-Image-Turbo is a high-speed text-to-image diffusion model distilled from Alibaba Tongyi team's image generation model. This repository provides the complete inference model pre-quantized and exported via QuantFunc.
With the latest QuantFunc ComfyUI plugin, inference achieves 2xβ6x speedup over mainstream frameworks.
Hardware Requirements
- Supports NVIDIA RTX 20 series and above
- RTX 20 series does not support BF16, which causes significant precision loss in Qwen series model quantization scenarios. Therefore, the 20 series currently only supports Z-Image models.
Directory Structure
Z-Image-Series/
βββ z-image-series-50x-above-base-model/ # Base model, optimized for RTX 50 series and above
β βββ text_encoder/ # Qwen3 text encoder (pre-quantized)
β βββ vae/ # VAE decoder (~160MB)
β βββ tokenizer/ # Tokenizer
β βββ scheduler/ # Scheduler config
β βββ model_index.json # Model index
β βββ quantfunc_config.json # QuantFunc quantization config
βββ z-image-series-50x-below-base-model/ # Base model, optimized for RTX 50 series and below
β βββ (same structure as above)
βββ transformer/
βββ config.json # Transformer architecture config
βββ z-image-turbo-50x-above-lighting.safetensors # RTX 50+ Lighting (~3.5GB)
βββ z-image-turbo-50x-below-lighting.safetensors # RTX 20/30/40 Lighting (~3.3GB)
Model Variants
| Variant | base-model | transformer | Total Size | Target GPU |
|---|---|---|---|---|
| 50x-above | z-image-series-50x-above-base-model |
z-image-turbo-50x-above-lighting.safetensors |
~6.5GB | RTX 50 series and above |
| 50x-below | z-image-series-50x-below-base-model |
z-image-turbo-50x-below-lighting.safetensors |
~6.2GB | RTX 20/30/40 series |
- 50x-above: Optimized for RTX 50 series (Blackwell) and above
- 50x-below: Optimized for RTX 20/30/40 series
The base-model and transformer must use the same variant (both above or both below).
Quick Start
Download
pip install huggingface_hub
from huggingface_hub import snapshot_download
model_dir = snapshot_download('QuantFunc/Z-Image-Series')
Inference
# RTX 50 series
quantfunc \
--model-dir Z-Image-Series/z-image-series-50x-above-base-model \
--transformer Z-Image-Series/transformer/z-image-turbo-50x-above-lighting.safetensors \
--auto-optimize --model-backend lighting \
--prompt "a cute cat sitting on a windowsill watching rain" \
--output output.png --steps 4
# RTX 20/30/40 series
quantfunc \
--model-dir Z-Image-Series/z-image-series-50x-below-base-model \
--transformer Z-Image-Series/transformer/z-image-turbo-50x-below-lighting.safetensors \
--auto-optimize --model-backend lighting \
--prompt "a cute cat sitting on a windowsill watching rain" \
--output output.png --steps 4
--auto-optimize automatically selects the optimal VRAM management, attention backend, and quantization compression strategy based on your GPU.
SVDQ && Lighting Backend
This repository provides Lighting backend models. Differences between the two backends:
| Feature | Lighting | SVDQ |
|---|---|---|
| Quantization | Per-layer mixed precision (FP4/INT4/FP8/INT8) | Nunchaku-based holistic pre-quantization |
| LoRA Integration | Real-time quantization β build a custom model in 5 minutes with zero speed loss, integrating any number of LoRAs | Runtime low-rank pathway |
| Ecosystem | QuantFunc native | Compatible with the widely-adopted Nunchaku ecosystem, enhanced with Rotation quantization and Auto Rank dynamic rank optimization |
| Flexibility | Per-layer precision control | Precision fixed at export time |
| Use Cases | Rapid personal model customization, batch LoRA integration | Leverage Nunchaku ecosystem, runtime dynamic LoRA |
Precision Config (precision-config/)
Sample per-layer precision configurations for the Lighting backend:
| File | Target GPU | Precision |
|---|---|---|
50x-above-fp4-sample.json |
RTX 50+ | FP4 all layers |
50x-below-int4-sample.json |
RTX 30/40 | INT4 all layers |
Related Repositories
- QuantFunc/Qwen-Image-Series β Qwen-Image text-to-image (60 layers)
- QuantFunc/Qwen-Image-Edit-Series β Qwen-Image-Edit image editing
License
The pre-quantized model weights in this repository are derived from the original models. Users must comply with the original model's license agreement. The QuantFunc inference engine and its plugins (including the ComfyUI plugin) are licensed separately β see official QuantFunc channels for details.
For models quantized from commercially licensed models, users are responsible for obtaining the necessary commercial licenses from the original model providers.
- Downloads last month
- -