lightx2v
/

Z-Image-Turbo-Quantized

Diffusion Single File

Model card Files Files and versions

Z-Image-Turbo-Quantized / README.md

lightx2v's picture

Update README.md

89fd15f verified 5 days ago

|

history blame contribute delete

3.5 kB

	---
	license: apache-2.0
	tags:
	- diffusion-single-file
	- comfyui
	- distillation
	- Z-Image-Turbo
	base_model:
	- Tongyi-MAI/Z-Image-Turbo
	pipeline_tags:
	- text-to-image
	library_name: diffusers
	pipeline_tag: text-to-image
	---
	# Z-Image-Turbo-Quantized

	Quantized weights for [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) optimized for 8GB VRAM GPUs.

	## 📦 Available Models

	- `z_image_turbo_scaled_fp8_e4m3fn.safetensors` (6.17 GB) - FP8 E4M3FN quantized weights
	- `z_image_turbo_int8.safetensors` (6.17 GB) - INT8 quantized weights

	## 🚀 Installation

	```bash
	git clone https://github.com/ModelTC/LightX2V.git
	cd LightX2V
	pip install .
	```

	## 💻 Usage for 8GB VRAM GPUs

	To run Z-Image-Turbo on 8GB VRAM GPUs, you need to:
	1. Use quantized transformer weights (FP8 or INT8)
	2. Use int4 quantized Qwen3 text encoder
	3. Enable CPU offloading

	### Complete Example

	```python
	from lightx2v import LightX2VPipeline

	# Initialize pipeline
	pipe = LightX2VPipeline(
	model_path="Tongyi-MAI/Z-Image-Turbo",
	model_cls="z_image",
	task="t2i",
	)

	# Step 1: Enable quantization (FP8 transformer + INT4 text encoder)
	pipe.enable_quantize(
	dit_quantized=True,
	dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
	quant_scheme="fp8-sgl",
	# IMPORTANT: Use int4 Qwen3 for 8GB VRAM
	text_encoder_quantized=True,
	text_encoder_quantized_ckpt="JunHowie/Qwen3-4B-GPTQ-Int4",
	text_encoder_quant_scheme="int4"
	)

	# Step 2: Enable CPU offloading
	pipe.enable_offload(
	cpu_offload=True,
	offload_granularity="model", # Use "model" for maximum memory savings
	)

	# Step 3: Create generator
	pipe.create_generator(
	attn_mode="flash_attn3",
	aspect_ratio="16:9",
	infer_steps=9,
	guidance_scale=1,
	)

	# Step 4: Generate image
	pipe.generate(
	seed=42,
	prompt="A beautiful landscape with mountains and lakes, ultra HD, 4K",
	negative_prompt="",
	save_result_path="output.png",
	)
	```

	## ⚙️ Configuration Options

	### Quantization Schemes

	FP8 (Recommended) - Better quality and speed:
	```python
	dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_scaled_fp8_e4m3fn.safetensors",
	quant_scheme="fp8-sgl",
	```

	INT8 - Alternative option:
	```python
	dit_quantized_ckpt="lightx2v/Z-Image-Turbo-Quantized/z_image_turbo_int8.safetensors",
	quant_scheme="int8-sgl",
	```

	### Offload Granularity

	- `"model"` (Recommended for 8GB): Offload entire model to CPU, load to GPU only during inference. Maximum memory savings.
	- `"block"`: Offload individual transformer blocks. More fine-grained control.

	## ⚠️ Important Notes

	1. Order matters: All `enable_quantize()` and `enable_offload()` calls must be made before `create_generator()`, otherwise they will not take effect.

	2. Text encoder quantization: Using int4 Qwen3 text encoder is highly recommended for 8GB VRAM GPUs to ensure stable operation.

	3. Memory optimization: The combination of FP8/INT8 transformer + int4 Qwen3 + model-level offloading is optimized for 8GB VRAM.

	## 📚 References

	- Original Model: [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
	- LightX2V: [GitHub](https://github.com/ModelTC/LightX2V)
	- Qwen3-4B-GPTQ-Int4: [JunHowie/Qwen3-4B-GPTQ-Int4](https://huggingface.co/JunHowie/Qwen3-4B-GPTQ-Int4)

	## 🤝 Community

	If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)