|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- diffusion-single-file |
|
|
- comfyui |
|
|
- distillation |
|
|
- Z-Image-Turbo |
|
|
base_model: |
|
|
- Tongyi-MAI/Z-Image-Turbo |
|
|
pipeline_tags: |
|
|
- text-to-image |
|
|
library_name: diffusers |
|
|
pipeline_tag: text-to-image |
|
|
--- |
|
|
# Z-Image-Turbo-Quantized |
|
|
|
|
|
Quantized weights for [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) optimized for **8GB VRAM GPUs**. |
|
|
|
|
|
## π¦ Available Models |
|
|
|
|
|
- **`z_image_turbo_scaled_fp8_e4m3fn.safetensors`** (6.17 GB) - FP8 E4M3FN quantized weights |
|
|
- **`z_image_turbo_int8.safetensors`** (6.17 GB) - INT8 quantized weights |
|
|
|
|
|
## π Installation |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/ModelTC/LightX2V.git |
|
|
cd LightX2V |
|
|
pip install . |
|
|
``` |
|
|
|
|
|
## π» Usage for 8GB VRAM GPUs |
|
|
|
|
|
To run Z-Image-Turbo on 8GB VRAM GPUs, you need to: |
|
|
1. Use quantized transformer weights (FP8 or INT8) |
|
|
2. Use int4 quantized Qwen3 text encoder |
|
|
3. Enable CPU offloading |
|
|
|
|
|
### Complete Example |
|
|
|
|
|
```python |
|
|
from lightx2v import LightX2VPipeline |
|
|
|
|
|
# Initialize pipeline |
|
|
pipe = LightX2VPipeline( |
|
|
model_path="Tongyi-MAI/Z-Image-Turbo", |
|
|
model_cls="z_image", |
|
|
task="t2i", |
|
|
) |
|
|
|
|
|
# Step 1: Enable quantization (FP8 transformer + INT4 text encoder) |
|
|
pipe.enable_quantize( |
|
|
dit_quantized=True, |
|
|
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors", |
|
|
quant_scheme="fp8-sgl", |
|
|
# IMPORTANT: Use int4 Qwen3 for 8GB VRAM |
|
|
text_encoder_quantized=True, |
|
|
text_encoder_quantized_ckpt="JunHowie/Qwen3-4B-GPTQ-Int4", |
|
|
text_encoder_quant_scheme="int4" |
|
|
) |
|
|
|
|
|
# Step 2: Enable CPU offloading |
|
|
pipe.enable_offload( |
|
|
cpu_offload=True, |
|
|
offload_granularity="model", # Use "model" for maximum memory savings |
|
|
) |
|
|
|
|
|
# Step 3: Create generator |
|
|
pipe.create_generator( |
|
|
attn_mode="flash_attn3", |
|
|
aspect_ratio="16:9", |
|
|
infer_steps=9, |
|
|
guidance_scale=1, |
|
|
) |
|
|
|
|
|
# Step 4: Generate image |
|
|
pipe.generate( |
|
|
seed=42, |
|
|
prompt="A beautiful landscape with mountains and lakes, ultra HD, 4K", |
|
|
negative_prompt="", |
|
|
save_result_path="output.png", |
|
|
) |
|
|
``` |
|
|
|
|
|
## βοΈ Configuration Options |
|
|
|
|
|
### Quantization Schemes |
|
|
|
|
|
**FP8 (Recommended)** - Better quality and speed: |
|
|
```python |
|
|
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors", |
|
|
quant_scheme="fp8-sgl", |
|
|
``` |
|
|
|
|
|
**INT8** - Alternative option: |
|
|
```python |
|
|
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_int8.safetensors", |
|
|
quant_scheme="int8-sgl", |
|
|
``` |
|
|
|
|
|
### Offload Granularity |
|
|
|
|
|
- **`"model"`** (Recommended for 8GB): Offload entire model to CPU, load to GPU only during inference. Maximum memory savings. |
|
|
- **`"block"`**: Offload individual transformer blocks. More fine-grained control. |
|
|
|
|
|
## β οΈ Important Notes |
|
|
|
|
|
1. **Order matters**: All `enable_quantize()` and `enable_offload()` calls must be made **before** `create_generator()`, otherwise they will not take effect. |
|
|
|
|
|
2. **Text encoder quantization**: Using int4 Qwen3 text encoder is **highly recommended** for 8GB VRAM GPUs to ensure stable operation. |
|
|
|
|
|
3. **Memory optimization**: The combination of FP8/INT8 transformer + int4 Qwen3 + model-level offloading is optimized for 8GB VRAM. |
|
|
|
|
|
## π References |
|
|
|
|
|
- Original Model: [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) |
|
|
- LightX2V: [GitHub](https://github.com/ModelTC/LightX2V) |
|
|
- Qwen3-4B-GPTQ-Int4: [JunHowie/Qwen3-4B-GPTQ-Int4](https://huggingface.co/JunHowie/Qwen3-4B-GPTQ-Int4) |
|
|
|
|
|
## π€ Community |
|
|
|
|
|
**If you find this project helpful, please give us a β on [GitHub](https://github.com/ModelTC/LightX2V)** |
|
|
|