metadata
license: apache-2.0
tags:
- diffusion-single-file
- comfyui
- distillation
- Z-Image-Turbo
base_model:
- Tongyi-MAI/Z-Image-Turbo
pipeline_tags:
- text-to-image
library_name: diffusers
pipeline_tag: text-to-image
Z-Image-Turbo-Quantized
Quantized weights for Z-Image-Turbo optimized for 8GB VRAM GPUs.
π¦ Available Models
z_image_turbo_scaled_fp8_e4m3fn.safetensors(6.17 GB) - FP8 E4M3FN quantized weightsz_image_turbo_int8.safetensors(6.17 GB) - INT8 quantized weights
π Installation
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
pip install .
π» Usage for 8GB VRAM GPUs
To run Z-Image-Turbo on 8GB VRAM GPUs, you need to:
- Use quantized transformer weights (FP8 or INT8)
- Use int4 quantized Qwen3 text encoder
- Enable CPU offloading
Complete Example
from lightx2v import LightX2VPipeline
# Initialize pipeline
pipe = LightX2VPipeline(
model_path="Tongyi-MAI/Z-Image-Turbo",
model_cls="z_image",
task="t2i",
)
# Step 1: Enable quantization (FP8 transformer + INT4 text encoder)
pipe.enable_quantize(
dit_quantized=True,
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
quant_scheme="fp8-sgl",
# IMPORTANT: Use int4 Qwen3 for 8GB VRAM
text_encoder_quantized=True,
text_encoder_quantized_ckpt="JunHowie/Qwen3-4B-GPTQ-Int4",
text_encoder_quant_scheme="int4"
)
# Step 2: Enable CPU offloading
pipe.enable_offload(
cpu_offload=True,
offload_granularity="model", # Use "model" for maximum memory savings
)
# Step 3: Create generator
pipe.create_generator(
attn_mode="flash_attn3",
aspect_ratio="16:9",
infer_steps=9,
guidance_scale=1,
)
# Step 4: Generate image
pipe.generate(
seed=42,
prompt="A beautiful landscape with mountains and lakes, ultra HD, 4K",
negative_prompt="",
save_result_path="output.png",
)
βοΈ Configuration Options
Quantization Schemes
FP8 (Recommended) - Better quality and speed:
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
quant_scheme="fp8-sgl",
INT8 - Alternative option:
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_int8.safetensors",
quant_scheme="int8-sgl",
Offload Granularity
"model"(Recommended for 8GB): Offload entire model to CPU, load to GPU only during inference. Maximum memory savings."block": Offload individual transformer blocks. More fine-grained control.
β οΈ Important Notes
Order matters: All
enable_quantize()andenable_offload()calls must be made beforecreate_generator(), otherwise they will not take effect.Text encoder quantization: Using int4 Qwen3 text encoder is highly recommended for 8GB VRAM GPUs to ensure stable operation.
Memory optimization: The combination of FP8/INT8 transformer + int4 Qwen3 + model-level offloading is optimized for 8GB VRAM.
π References
- Original Model: Tongyi-MAI/Z-Image-Turbo
- LightX2V: GitHub
- Qwen3-4B-GPTQ-Int4: JunHowie/Qwen3-4B-GPTQ-Int4
π€ Community
If you find this project helpful, please give us a β on GitHub