File size: 3,504 Bytes

---
license: apache-2.0
tags:
- diffusion-single-file
- comfyui
- distillation
- Z-Image-Turbo
base_model:
- Tongyi-MAI/Z-Image-Turbo
pipeline_tags:
- text-to-image
library_name: diffusers
pipeline_tag: text-to-image
---
# Z-Image-Turbo-Quantized

Quantized weights for [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) optimized for **8GB VRAM GPUs**.

## 📦 Available Models

- **`z_image_turbo_scaled_fp8_e4m3fn.safetensors`** (6.17 GB) - FP8 E4M3FN quantized weights
- **`z_image_turbo_int8.safetensors`** (6.17 GB) - INT8 quantized weights

## 🚀 Installation

```bash
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
pip install .
```

## 💻 Usage for 8GB VRAM GPUs

To run Z-Image-Turbo on 8GB VRAM GPUs, you need to:
1. Use quantized transformer weights (FP8 or INT8)
2. Use int4 quantized Qwen3 text encoder
3. Enable CPU offloading

### Complete Example

```python
from lightx2v import LightX2VPipeline

# Initialize pipeline
pipe = LightX2VPipeline(
    model_path="Tongyi-MAI/Z-Image-Turbo",
    model_cls="z_image",
    task="t2i",
)

# Step 1: Enable quantization (FP8 transformer + INT4 text encoder)
pipe.enable_quantize(
    dit_quantized=True,
    dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
    quant_scheme="fp8-sgl",
    # IMPORTANT: Use int4 Qwen3 for 8GB VRAM
    text_encoder_quantized=True,
    text_encoder_quantized_ckpt="JunHowie/Qwen3-4B-GPTQ-Int4",
    text_encoder_quant_scheme="int4"
)

# Step 2: Enable CPU offloading
pipe.enable_offload(
    cpu_offload=True,
    offload_granularity="model",  # Use "model" for maximum memory savings
)

# Step 3: Create generator
pipe.create_generator(
    attn_mode="flash_attn3",
    aspect_ratio="16:9",
    infer_steps=9,
    guidance_scale=1,
)

# Step 4: Generate image
pipe.generate(
    seed=42,
    prompt="A beautiful landscape with mountains and lakes, ultra HD, 4K",
    negative_prompt="",
    save_result_path="output.png",
)
```

## ⚙️ Configuration Options

### Quantization Schemes

**FP8 (Recommended)** - Better quality and speed:
```python
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
quant_scheme="fp8-sgl",
```

**INT8** - Alternative option:
```python
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_int8.safetensors",
quant_scheme="int8-sgl",
```

### Offload Granularity

- **`"model"`** (Recommended for 8GB): Offload entire model to CPU, load to GPU only during inference. Maximum memory savings.
- **`"block"`**: Offload individual transformer blocks. More fine-grained control.

## ⚠️ Important Notes

1. **Order matters**: All `enable_quantize()` and `enable_offload()` calls must be made **before** `create_generator()`, otherwise they will not take effect.

2. **Text encoder quantization**: Using int4 Qwen3 text encoder is **highly recommended** for 8GB VRAM GPUs to ensure stable operation.

3. **Memory optimization**: The combination of FP8/INT8 transformer + int4 Qwen3 + model-level offloading is optimized for 8GB VRAM.

## 📚 References

- Original Model: [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
- LightX2V: [GitHub](https://github.com/ModelTC/LightX2V)
- Qwen3-4B-GPTQ-Int4: [JunHowie/Qwen3-4B-GPTQ-Int4](https://huggingface.co/JunHowie/Qwen3-4B-GPTQ-Int4)

## 🤝 Community

**If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)**