QuantFunc

Logo

Z-Image-Series

Pre-quantized Z-Image-Turbo text-to-image model series by QuantFunc, with Lighting backend inference support.

Overview

Z-Image-Turbo is a high-speed text-to-image diffusion model distilled from Alibaba Tongyi team's image generation model. This repository provides the complete inference model pre-quantized and exported via QuantFunc.

With the latest QuantFunc ComfyUI plugin, inference achieves 2x–6x speedup over mainstream frameworks.

Hardware Requirements

  • Supports NVIDIA RTX 20 series and above
  • RTX 20 series does not support BF16, which causes significant precision loss in Qwen series model quantization scenarios. Therefore, the 20 series currently only supports Z-Image models.

Directory Structure

Z-Image-Series/
β”œβ”€β”€ z-image-series-50x-above-base-model/     # Base model, optimized for RTX 50 series and above
β”‚   β”œβ”€β”€ text_encoder/                        # Qwen3 text encoder (pre-quantized)
β”‚   β”œβ”€β”€ vae/                                 # VAE decoder (~160MB)
β”‚   β”œβ”€β”€ tokenizer/                           # Tokenizer
β”‚   β”œβ”€β”€ scheduler/                           # Scheduler config
β”‚   β”œβ”€β”€ model_index.json                     # Model index
β”‚   └── quantfunc_config.json                # QuantFunc quantization config
β”œβ”€β”€ z-image-series-50x-below-base-model/     # Base model, optimized for RTX 50 series and below
β”‚   └── (same structure as above)
└── transformer/
    β”œβ”€β”€ config.json                          # Transformer architecture config
    β”œβ”€β”€ z-image-turbo-50x-above-lighting.safetensors   # RTX 50+ Lighting (~3.5GB)
    └── z-image-turbo-50x-below-lighting.safetensors   # RTX 20/30/40 Lighting (~3.3GB)

Model Variants

Variant base-model transformer Total Size Target GPU
50x-above z-image-series-50x-above-base-model z-image-turbo-50x-above-lighting.safetensors ~6.5GB RTX 50 series and above
50x-below z-image-series-50x-below-base-model z-image-turbo-50x-below-lighting.safetensors ~6.2GB RTX 20/30/40 series
  • 50x-above: Optimized for RTX 50 series (Blackwell) and above
  • 50x-below: Optimized for RTX 20/30/40 series

The base-model and transformer must use the same variant (both above or both below).

Quick Start

Download

pip install huggingface_hub
from huggingface_hub import snapshot_download
model_dir = snapshot_download('QuantFunc/Z-Image-Series')

Inference

# RTX 50 series
quantfunc \
  --model-dir Z-Image-Series/z-image-series-50x-above-base-model \
  --transformer Z-Image-Series/transformer/z-image-turbo-50x-above-lighting.safetensors \
  --auto-optimize --model-backend lighting \
  --prompt "a cute cat sitting on a windowsill watching rain" \
  --output output.png --steps 4

# RTX 20/30/40 series
quantfunc \
  --model-dir Z-Image-Series/z-image-series-50x-below-base-model \
  --transformer Z-Image-Series/transformer/z-image-turbo-50x-below-lighting.safetensors \
  --auto-optimize --model-backend lighting \
  --prompt "a cute cat sitting on a windowsill watching rain" \
  --output output.png --steps 4

--auto-optimize automatically selects the optimal VRAM management, attention backend, and quantization compression strategy based on your GPU.

SVDQ && Lighting Backend

This repository provides Lighting backend models. Differences between the two backends:

Feature Lighting SVDQ
Quantization Per-layer mixed precision (FP4/INT4/FP8/INT8) Nunchaku-based holistic pre-quantization
LoRA Integration Real-time quantization β€” build a custom model in 5 minutes with zero speed loss, integrating any number of LoRAs Runtime low-rank pathway
Ecosystem QuantFunc native Compatible with the widely-adopted Nunchaku ecosystem, enhanced with Rotation quantization and Auto Rank dynamic rank optimization
Flexibility Per-layer precision control Precision fixed at export time
Use Cases Rapid personal model customization, batch LoRA integration Leverage Nunchaku ecosystem, runtime dynamic LoRA

Precision Config (precision-config/)

Sample per-layer precision configurations for the Lighting backend:

File Target GPU Precision
50x-above-fp4-sample.json RTX 50+ FP4 all layers
50x-below-int4-sample.json RTX 30/40 INT4 all layers

Related Repositories

License

The pre-quantized model weights in this repository are derived from the original models. Users must comply with the original model's license agreement. The QuantFunc inference engine and its plugins (including the ComfyUI plugin) are licensed separately β€” see official QuantFunc channels for details.

For models quantized from commercially licensed models, users are responsible for obtaining the necessary commercial licenses from the original model providers.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support