lightx2v
/

Z-Image-Turbo-Quantized

+---
+license: apache-2.0
+tags:
+- diffusion-single-file
+- comfyui
+- distillation
+- Z-Image-Turbo
+base_model:
+- Tongyi-MAI/Z-Image-Turbo
+pipeline_tags:
+- text-to-image
+library_name: diffusers
+pipeline_tag: text-to-image
+---
+# Z-Image-Turbo-Quantized
+Quantized weights for [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) optimized for **8GB VRAM GPUs**.
+## 📦 Available Models
+- **`z_image_turbo_scaled_fp8_e4m3fn.safetensors`** (6.17 GB) - FP8 E4M3FN quantized weights
+- **`z_image_turbo_int8.safetensors`** (6.17 GB) - INT8 quantized weights
+## 🚀 Installation
+```bash
+git clone https://github.com/ModelTC/LightX2V.git
+cd LightX2V
+pip install .
+```
+## 💻 Usage for 8GB VRAM GPUs
+To run Z-Image-Turbo on 8GB VRAM GPUs, you need to:
+1. Use quantized transformer weights (FP8 or INT8)
+2. Use int4 quantized Qwen3 text encoder
+3. Enable CPU offloading
+### Complete Example
+```python
+from lightx2v import LightX2VPipeline
+# Initialize pipeline
+pipe = LightX2VPipeline(
+    model_path="Tongyi-MAI/Z-Image-Turbo",
+    model_cls="z_image",
+    task="t2i",
+)
+# Step 1: Enable quantization (FP8 transformer + INT4 text encoder)
+pipe.enable_quantize(
+    dit_quantized=True,
+    dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
+    quant_scheme="fp8-sgl",
+    # IMPORTANT: Use int4 Qwen3 for 8GB VRAM
+    text_encoder_quantized=True,
+    text_encoder_quantized_ckpt="JunHowie/Qwen3-4B-GPTQ-Int4",
+    text_encoder_quant_scheme="int4"
+)
+# Step 2: Enable CPU offloading
+pipe.enable_offload(
+    cpu_offload=True,
+    offload_granularity="model",  # Use "model" for maximum memory savings
+)
+# Step 3: Create generator
+pipe.create_generator(
+    attn_mode="flash_attn3",
+    aspect_ratio="16:9",
+    infer_steps=9,
+    guidance_scale=1,
+)
+# Step 4: Generate image
+pipe.generate(
+    seed=42,
+    prompt="A beautiful landscape with mountains and lakes, ultra HD, 4K",
+    negative_prompt="",
+    save_result_path="output.png",
+)
+```
+## ⚙️ Configuration Options
+### Quantization Schemes
+**FP8 (Recommended)** - Better quality and speed:
+```python
+dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
+quant_scheme="fp8-sgl",
+```
+**INT8** - Alternative option:
+```python
+dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_int8.safetensors",
+quant_scheme="int8-sgl",
+```
+### Offload Granularity
+- **`"model"`** (Recommended for 8GB): Offload entire model to CPU, load to GPU only during inference. Maximum memory savings.
+- **`"block"`**: Offload individual transformer blocks. More fine-grained control.
+## ⚠️ Important Notes
+1. **Order matters**: All `enable_quantize()` and `enable_offload()` calls must be made **before** `create_generator()`, otherwise they will not take effect.
+2. **Text encoder quantization**: Using int4 Qwen3 text encoder is **highly recommended** for 8GB VRAM GPUs to ensure stable operation.
+3. **Memory optimization**: The combination of FP8/INT8 transformer + int4 Qwen3 + model-level offloading is optimized for 8GB VRAM.
+## 📚 References
+- Original Model: [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
+- LightX2V: [GitHub](https://github.com/ModelTC/LightX2V)
+- Qwen3-4B-GPTQ-Int4: [JunHowie/Qwen3-4B-GPTQ-Int4](https://huggingface.co/JunHowie/Qwen3-4B-GPTQ-Int4)
+## 📄 License
+Apache 2.0