File size: 2,732 Bytes
8622ebe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: other
library_name: diffusers
tags:
- text-to-image
- z-image
- diffusers
- quantized
- int8
- sdnq
- safetensors
pipeline_tag: text-to-image
---

# Tongyi-MAI/Z-Image-Turbo - Quantized (8-bit)

## Overview
This is a **quantized version** of [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).

All components have been quantized to 8-bit using SDNQ, while preserving the original folder structure for seamless integration.

## Architecture
- **Pipeline**: ZImagePipeline
- **Main component**: ZImageTransformer2DModel
- **Quantization**: 8-bit

## Usage

```python
import torch
from diffusers import ZImagePipeline, AutoencoderKL, FlowMatchEulerDiscreteScheduler
from transformers import Qwen3Model, AutoTokenizer
from sdnq import load_sdnq_model

model_path = "Tongyi-MAI_Z-Image-Turbo-int8"

# Load transformer with SDNQ (quantized to 8-bit)
transformer = load_sdnq_model(
    f"{model_path}/transformer",
    model_cls=ZImageTransformer2DModel,
    device="cpu"
)

# Load other components from this model (all included!)
vae = AutoencoderKL.from_pretrained(f"{model_path}/vae", torch_dtype=torch.float16)
text_encoder = Qwen3Model.from_pretrained(f"{model_path}/text_encoder", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(f"{model_path}/tokenizer")
scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(f"{model_path}/scheduler")

# Construct pipeline
pipe = ZImagePipeline(
    transformer=transformer,
    vae=vae,
    text_encoder=text_encoder,
    tokenizer=tokenizer,
    scheduler=scheduler,
)

pipe.to("cuda")

# Generate an image
image = pipe(
    prompt="A serene mountain landscape at sunrise",
    num_inference_steps=20,
).images[0]
image.save("output.png")
```

## Components

-**transformer** (ZImageTransformer2DModel) - Quantized to 8-bit
-**vae** (AutoencoderKL) - Quantized to 8-bit

**Note**: Some components are included unquantized due to SDNQ library limitations:
- 📦 **text_encoder** - Included unquantized (SDNQ bug workaround)


## Quantization Details
- **Original model**: [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
- **Quantization**: 8-bit
- **Quantizer**: SDNQ
- **Date**: 2026-01-18 13:41:50

## Size Reduction
- Original: ~30GB (estimated)
- Quantized: See individual component sizes

## Notes
- This is a complete drop-in replacement - all components included
- SDNQ quantization provides excellent quality at reduced size
- Requires `sdnq` library to be installed: `pip install sdnq`
- Quality loss is minimal with 8-bit quantization
- Some components may be included unquantized due to library limitations

---
Quantized with [BugQuant](https://github.com/yourusername/BugQuant)