Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,122 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- diffusion-single-file
|
| 5 |
+
- comfyui
|
| 6 |
+
- distillation
|
| 7 |
+
- Z-Image-Turbo
|
| 8 |
+
base_model:
|
| 9 |
+
- Tongyi-MAI/Z-Image-Turbo
|
| 10 |
+
pipeline_tags:
|
| 11 |
+
- text-to-image
|
| 12 |
+
library_name: diffusers
|
| 13 |
+
pipeline_tag: text-to-image
|
| 14 |
+
---
|
| 15 |
+
# Z-Image-Turbo-Quantized
|
| 16 |
+
|
| 17 |
+
Quantized weights for [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) optimized for **8GB VRAM GPUs**.
|
| 18 |
+
|
| 19 |
+
## 📦 Available Models
|
| 20 |
+
|
| 21 |
+
- **`z_image_turbo_scaled_fp8_e4m3fn.safetensors`** (6.17 GB) - FP8 E4M3FN quantized weights
|
| 22 |
+
- **`z_image_turbo_int8.safetensors`** (6.17 GB) - INT8 quantized weights
|
| 23 |
+
|
| 24 |
+
## 🚀 Installation
|
| 25 |
+
|
| 26 |
+
```bash
|
| 27 |
+
git clone https://github.com/ModelTC/LightX2V.git
|
| 28 |
+
cd LightX2V
|
| 29 |
+
pip install .
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
## 💻 Usage for 8GB VRAM GPUs
|
| 33 |
+
|
| 34 |
+
To run Z-Image-Turbo on 8GB VRAM GPUs, you need to:
|
| 35 |
+
1. Use quantized transformer weights (FP8 or INT8)
|
| 36 |
+
2. Use int4 quantized Qwen3 text encoder
|
| 37 |
+
3. Enable CPU offloading
|
| 38 |
+
|
| 39 |
+
### Complete Example
|
| 40 |
+
|
| 41 |
+
```python
|
| 42 |
+
from lightx2v import LightX2VPipeline
|
| 43 |
+
|
| 44 |
+
# Initialize pipeline
|
| 45 |
+
pipe = LightX2VPipeline(
|
| 46 |
+
model_path="Tongyi-MAI/Z-Image-Turbo",
|
| 47 |
+
model_cls="z_image",
|
| 48 |
+
task="t2i",
|
| 49 |
+
)
|
| 50 |
+
|
| 51 |
+
# Step 1: Enable quantization (FP8 transformer + INT4 text encoder)
|
| 52 |
+
pipe.enable_quantize(
|
| 53 |
+
dit_quantized=True,
|
| 54 |
+
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
|
| 55 |
+
quant_scheme="fp8-sgl",
|
| 56 |
+
# IMPORTANT: Use int4 Qwen3 for 8GB VRAM
|
| 57 |
+
text_encoder_quantized=True,
|
| 58 |
+
text_encoder_quantized_ckpt="JunHowie/Qwen3-4B-GPTQ-Int4",
|
| 59 |
+
text_encoder_quant_scheme="int4"
|
| 60 |
+
)
|
| 61 |
+
|
| 62 |
+
# Step 2: Enable CPU offloading
|
| 63 |
+
pipe.enable_offload(
|
| 64 |
+
cpu_offload=True,
|
| 65 |
+
offload_granularity="model", # Use "model" for maximum memory savings
|
| 66 |
+
)
|
| 67 |
+
|
| 68 |
+
# Step 3: Create generator
|
| 69 |
+
pipe.create_generator(
|
| 70 |
+
attn_mode="flash_attn3",
|
| 71 |
+
aspect_ratio="16:9",
|
| 72 |
+
infer_steps=9,
|
| 73 |
+
guidance_scale=1,
|
| 74 |
+
)
|
| 75 |
+
|
| 76 |
+
# Step 4: Generate image
|
| 77 |
+
pipe.generate(
|
| 78 |
+
seed=42,
|
| 79 |
+
prompt="A beautiful landscape with mountains and lakes, ultra HD, 4K",
|
| 80 |
+
negative_prompt="",
|
| 81 |
+
save_result_path="output.png",
|
| 82 |
+
)
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
## ⚙️ Configuration Options
|
| 86 |
+
|
| 87 |
+
### Quantization Schemes
|
| 88 |
+
|
| 89 |
+
**FP8 (Recommended)** - Better quality and speed:
|
| 90 |
+
```python
|
| 91 |
+
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
|
| 92 |
+
quant_scheme="fp8-sgl",
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
**INT8** - Alternative option:
|
| 96 |
+
```python
|
| 97 |
+
dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_int8.safetensors",
|
| 98 |
+
quant_scheme="int8-sgl",
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
### Offload Granularity
|
| 102 |
+
|
| 103 |
+
- **`"model"`** (Recommended for 8GB): Offload entire model to CPU, load to GPU only during inference. Maximum memory savings.
|
| 104 |
+
- **`"block"`**: Offload individual transformer blocks. More fine-grained control.
|
| 105 |
+
|
| 106 |
+
## ⚠️ Important Notes
|
| 107 |
+
|
| 108 |
+
1. **Order matters**: All `enable_quantize()` and `enable_offload()` calls must be made **before** `create_generator()`, otherwise they will not take effect.
|
| 109 |
+
|
| 110 |
+
2. **Text encoder quantization**: Using int4 Qwen3 text encoder is **highly recommended** for 8GB VRAM GPUs to ensure stable operation.
|
| 111 |
+
|
| 112 |
+
3. **Memory optimization**: The combination of FP8/INT8 transformer + int4 Qwen3 + model-level offloading is optimized for 8GB VRAM.
|
| 113 |
+
|
| 114 |
+
## 📚 References
|
| 115 |
+
|
| 116 |
+
- Original Model: [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
|
| 117 |
+
- LightX2V: [GitHub](https://github.com/ModelTC/LightX2V)
|
| 118 |
+
- Qwen3-4B-GPTQ-Int4: [JunHowie/Qwen3-4B-GPTQ-Int4](https://huggingface.co/JunHowie/Qwen3-4B-GPTQ-Int4)
|
| 119 |
+
|
| 120 |
+
## 📄 License
|
| 121 |
+
|
| 122 |
+
Apache 2.0
|