--- language: - en license: other library_name: diffusers pipeline_tag: text-to-image tags: - text-to-image - diffusers - quanto - int8 - z-image - transformer-quantization base_model: - Tongyi-MAI/Z-Image base_model_relation: quantized --- # Z-Image INT8 (Quanto) This repository provides an INT8-quantized variant of [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image): - **Only** the `transformer` is quantized with **Quanto weight-only INT8**. - `text_encoder`, `vae`, `scheduler`, and `tokenizer` remain unchanged. - Inference API stays compatible with `diffusers.ZImagePipeline`. > Please follow the original upstream model license and usage terms. `license: other` means this repo inherits upstream licensing constraints. ## Model Details - **Base model**: `Tongyi-MAI/Z-Image` - **Quantization method**: `optimum-quanto` (weight-only INT8) - **Quantized part**: `transformer` - **Compute dtype**: `bfloat16` - **Pipeline**: `diffusers.ZImagePipeline` - **Negative prompt support**: Yes (same pipeline API as the base model) ## Platform Support - ✅ Supported: Linux/Windows with NVIDIA CUDA - ⚠️ Limited support: macOS Apple Silicon (MPS, usually much slower than CUDA) - ❌ Not supported: macOS Intel ## Files Key files in this repository: - `model_index.json` - `transformer/diffusion_pytorch_model.safetensors` (INT8-quantized weights) - `text_encoder/*`, `vae/*`, `scheduler/*`, `tokenizer/*` (not quantized) - `zimage_quanto_bench_results/*` (benchmark metrics and baseline-vs-int8 images) - `test_outputs/*` (generated examples) ## Installation Python 3.10+ is recommended. ```bash # Create env (optional) python -m venv .venv # Windows .venv\Scripts\activate # Linux/macOS # source .venv/bin/activate python -m pip install --upgrade pip # PyTorch (NVIDIA CUDA, example) pip install torch --index-url https://download.pytorch.org/whl/cu128 # PyTorch (macOS Apple Silicon, MPS) # pip install torch # Inference dependencies pip install diffusers transformers accelerate safetensors sentencepiece optimum-quanto pillow ``` ## Quick Start (Diffusers) This repo already stores quantized weights, so you do **not** need to re-run quantization during loading. ```python import torch from diffusers import ZImagePipeline model_id = "ixim/Z-Image-INT8" if torch.cuda.is_available(): device = "cuda" dtype = torch.bfloat16 elif torch.backends.mps.is_available(): # Apple Silicon device = "mps" dtype = torch.bfloat16 else: # CPU fallback (functional but very slow for this model) device = "cpu" dtype = torch.float32 pipe = ZImagePipeline.from_pretrained( model_id, torch_dtype=dtype, low_cpu_mem_usage=True, ) pipe.enable_attention_slicing() if device == "cuda": pipe.enable_model_cpu_offload() else: pipe = pipe.to(device) prompt = "A cinematic portrait of a young woman, soft lighting, high detail" negative_prompt = "blurry, sad, low quality, distorted face, extra limbs, artifacts" # Use CPU generator for best cross-device compatibility (cpu/mps/cuda) generator = torch.Generator(device="cpu").manual_seed(42) image = pipe( prompt=prompt, negative_prompt=negative_prompt, height=1024, width=1024, num_inference_steps=28, guidance_scale=4.0, generator=generator, ).images[0] image.save("zimage_int8_sample.png") print("Saved: zimage_int8_sample.png") ``` ## macOS Notes & Troubleshooting - macOS Intel is no longer supported for this model in this repository. - If you need macOS inference, use Apple Silicon (`mps`) only. - On Apple Silicon, warnings like `CUDA not available` and `Disabling autocast` are expected in non-CUDA execution paths. - Slow speed on Mac is expected compared with high-end NVIDIA GPUs. To improve speed on Apple Silicon: - Ensure the script uses `mps` (as in the example above), not `cpu`. - Start from `height=512`, `width=512`, and fewer steps (e.g., `20~28`) before scaling up. ## Additional Generated Samples (INT8) These two images are generated with this quantized model: ### 1) `en_portrait_1024x1024.png` - **Prompt**: `A cinematic portrait of a young woman standing by the window, golden hour sunlight, shallow depth of field, film grain, ultra-detailed skin texture, photorealistic`

