File size: 5,720 Bytes

---
language:
  - en
license: other
license_name: quantfunc-model-license
tags:
  - image-generation
  - text-to-image
  - diffusion
  - quantized
  - quantfunc
library_name: diffusers
pipeline_tag: text-to-image
---

# QuantFunc

<div align="center" style="margin-top: 50px;">
  <img src="assets/logo.webp" width="300" alt="Logo">
</div>

# Z-Image-Series

Pre-quantized **Z-Image-Turbo** text-to-image model series by [QuantFunc](https://github.com/user/quantfunc), with Lighting backend inference support.

## Overview

Z-Image-Turbo is a high-speed text-to-image diffusion model distilled from Alibaba Tongyi team's image generation model. This repository provides the complete inference model pre-quantized and exported via QuantFunc.

With the latest QuantFunc ComfyUI plugin, inference achieves **2x–6x speedup** over mainstream frameworks.

## Hardware Requirements

- Supports NVIDIA RTX 20 series and above
- RTX 20 series does not support BF16, which causes significant precision loss in Qwen series model quantization scenarios. Therefore, the 20 series currently only supports Z-Image models.

## Directory Structure

```
Z-Image-Series/
├── z-image-series-50x-above-base-model/     # Base model, optimized for RTX 50 series and above
│   ├── text_encoder/                        # Qwen3 text encoder (pre-quantized)
│   ├── vae/                                 # VAE decoder (~160MB)
│   ├── tokenizer/                           # Tokenizer
│   ├── scheduler/                           # Scheduler config
│   ├── model_index.json                     # Model index
│   └── quantfunc_config.json                # QuantFunc quantization config
├── z-image-series-50x-below-base-model/     # Base model, optimized for RTX 50 series and below
│   └── (same structure as above)
└── transformer/
    ├── config.json                          # Transformer architecture config
    ├── z-image-turbo-50x-above-lighting.safetensors   # RTX 50+ Lighting (~3.5GB)
    └── z-image-turbo-50x-below-lighting.safetensors   # RTX 20/30/40 Lighting (~3.3GB)
```

## Model Variants

| Variant | base-model | transformer | Total Size | Target GPU |
|---------|-----------|-------------|------------|------------|
| **50x-above** | `z-image-series-50x-above-base-model` | `z-image-turbo-50x-above-lighting.safetensors` | ~6.5GB | RTX 50 series and above |
| **50x-below** | `z-image-series-50x-below-base-model` | `z-image-turbo-50x-below-lighting.safetensors` | ~6.2GB | RTX 20/30/40 series |

- **50x-above**: Optimized for RTX 50 series (Blackwell) and above
- **50x-below**: Optimized for RTX 20/30/40 series

> The base-model and transformer must use the **same variant** (both above or both below).

## Quick Start

### Download

```bash
pip install huggingface_hub
```

```python
from huggingface_hub import snapshot_download
model_dir = snapshot_download('QuantFunc/Z-Image-Series')
```

### Inference

```bash
# RTX 50 series
quantfunc \
  --model-dir Z-Image-Series/z-image-series-50x-above-base-model \
  --transformer Z-Image-Series/transformer/z-image-turbo-50x-above-lighting.safetensors \
  --auto-optimize --model-backend lighting \
  --prompt "a cute cat sitting on a windowsill watching rain" \
  --output output.png --steps 4

# RTX 20/30/40 series
quantfunc \
  --model-dir Z-Image-Series/z-image-series-50x-below-base-model \
  --transformer Z-Image-Series/transformer/z-image-turbo-50x-below-lighting.safetensors \
  --auto-optimize --model-backend lighting \
  --prompt "a cute cat sitting on a windowsill watching rain" \
  --output output.png --steps 4
```

`--auto-optimize` automatically selects the optimal VRAM management, attention backend, and quantization compression strategy based on your GPU.

## SVDQ && Lighting Backend

This repository provides **Lighting** backend models. Differences between the two backends:

| Feature | Lighting | SVDQ |
|---------|----------|------|
| **Quantization** | Per-layer mixed precision (FP4/INT4/FP8/INT8) | Nunchaku-based holistic pre-quantization |
| **LoRA Integration** | Real-time quantization — build a custom model in 5 minutes with zero speed loss, integrating any number of LoRAs | Runtime low-rank pathway |
| **Ecosystem** | QuantFunc native | Compatible with the widely-adopted Nunchaku ecosystem, enhanced with Rotation quantization and Auto Rank dynamic rank optimization |
| **Flexibility** | Per-layer precision control | Precision fixed at export time |
| **Use Cases** | Rapid personal model customization, batch LoRA integration | Leverage Nunchaku ecosystem, runtime dynamic LoRA |

## Precision Config (precision-config/)

Sample per-layer precision configurations for the Lighting backend:

| File | Target GPU | Precision |
|------|-----------|-----------|
| `50x-above-fp4-sample.json` | RTX 50+ | FP4 all layers |
| `50x-below-int4-sample.json` | RTX 30/40 | INT4 all layers |

## Related Repositories

- [QuantFunc/Qwen-Image-Series](https://huggingface.co/QuantFunc/Qwen-Image-Series) — Qwen-Image text-to-image (60 layers)
- [QuantFunc/Qwen-Image-Edit-Series](https://huggingface.co/QuantFunc/Qwen-Image-Edit-Series) — Qwen-Image-Edit image editing

## License

The pre-quantized model weights in this repository are derived from the original models. Users must comply with the original model's license agreement. The QuantFunc inference engine and its plugins (including the ComfyUI plugin) are licensed separately — see official QuantFunc channels for details.

For models quantized from commercially licensed models, users are responsible for obtaining the necessary commercial licenses from the original model providers.