--- license: apache-2.0 tags: - diffusion-single-file - comfyui - distillation - Z-Image-Turbo base_model: - Tongyi-MAI/Z-Image-Turbo pipeline_tags: - text-to-image library_name: diffusers pipeline_tag: text-to-image --- # Z-Image-Turbo-Quantized Quantized weights for [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) optimized for **8GB VRAM GPUs**. ## 📦 Available Models - **`z_image_turbo_scaled_fp8_e4m3fn.safetensors`** (6.17 GB) - FP8 E4M3FN quantized weights - **`z_image_turbo_int8.safetensors`** (6.17 GB) - INT8 quantized weights ## 🚀 Installation ```bash git clone https://github.com/ModelTC/LightX2V.git cd LightX2V pip install . ``` ## 💻 Usage for 8GB VRAM GPUs To run Z-Image-Turbo on 8GB VRAM GPUs, you need to: 1. Use quantized transformer weights (FP8 or INT8) 2. Use int4 quantized Qwen3 text encoder 3. Enable CPU offloading ### Complete Example ```python from lightx2v import LightX2VPipeline # Initialize pipeline pipe = LightX2VPipeline( model_path="Tongyi-MAI/Z-Image-Turbo", model_cls="z_image", task="t2i", ) # Step 1: Enable quantization (FP8 transformer + INT4 text encoder) pipe.enable_quantize( dit_quantized=True, dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors", quant_scheme="fp8-sgl", # IMPORTANT: Use int4 Qwen3 for 8GB VRAM text_encoder_quantized=True, text_encoder_quantized_ckpt="JunHowie/Qwen3-4B-GPTQ-Int4", text_encoder_quant_scheme="int4" ) # Step 2: Enable CPU offloading pipe.enable_offload( cpu_offload=True, offload_granularity="model", # Use "model" for maximum memory savings ) # Step 3: Create generator pipe.create_generator( attn_mode="flash_attn3", aspect_ratio="16:9", infer_steps=9, guidance_scale=1, ) # Step 4: Generate image pipe.generate( seed=42, prompt="A beautiful landscape with mountains and lakes, ultra HD, 4K", negative_prompt="", save_result_path="output.png", ) ``` ## ⚙️ Configuration Options ### Quantization Schemes **FP8 (Recommended)** - Better quality and speed: ```python dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors", quant_scheme="fp8-sgl", ``` **INT8** - Alternative option: ```python dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_int8.safetensors", quant_scheme="int8-sgl", ``` ### Offload Granularity - **`"model"`** (Recommended for 8GB): Offload entire model to CPU, load to GPU only during inference. Maximum memory savings. - **`"block"`**: Offload individual transformer blocks. More fine-grained control. ## ⚠️ Important Notes 1. **Order matters**: All `enable_quantize()` and `enable_offload()` calls must be made **before** `create_generator()`, otherwise they will not take effect. 2. **Text encoder quantization**: Using int4 Qwen3 text encoder is **highly recommended** for 8GB VRAM GPUs to ensure stable operation. 3. **Memory optimization**: The combination of FP8/INT8 transformer + int4 Qwen3 + model-level offloading is optimized for 8GB VRAM. ## 📚 References - Original Model: [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) - LightX2V: [GitHub](https://github.com/ModelTC/LightX2V) - Qwen3-4B-GPTQ-Int4: [JunHowie/Qwen3-4B-GPTQ-Int4](https://huggingface.co/JunHowie/Qwen3-4B-GPTQ-Int4) ## 🤝 Community **If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)**