ixim commited on
Commit
fb39ee7
·
verified ·
1 Parent(s): 74dce66

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +188 -1
README.md CHANGED
@@ -1,3 +1,190 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: other
5
+ library_name: diffusers
6
+ pipeline_tag: text-to-image
7
+ tags:
8
+ - text-to-image
9
+ - diffusers
10
+ - quanto
11
+ - int8
12
+ - z-image
13
+ - transformer-quantization
14
+ base_model:
15
+ - Tongyi-MAI/Z-Image
16
  ---
17
+
18
+ # Z-Image INT8 (Quanto)
19
+
20
+ This repository provides an INT8-quantized variant of [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image):
21
+ - **Only** the `transformer` is quantized with **Quanto weight-only INT8**.
22
+ - `text_encoder`, `vae`, `scheduler`, and `tokenizer` remain unchanged.
23
+ - Inference API stays compatible with `diffusers.ZImagePipeline`.
24
+
25
+ > Please follow the original upstream model license and usage terms. `license: other` means this repo inherits upstream licensing constraints.
26
+
27
+ ## Model Details
28
+
29
+ - **Base model**: `Tongyi-MAI/Z-Image`
30
+ - **Quantization method**: `optimum-quanto` (weight-only INT8)
31
+ - **Quantized part**: `transformer`
32
+ - **Compute dtype**: `bfloat16`
33
+ - **Pipeline**: `diffusers.ZImagePipeline`
34
+ - **Negative prompt support**: Yes (same pipeline API as the base model)
35
+
36
+ ## Files
37
+
38
+ Key files in this repository:
39
+ - `model_index.json`
40
+ - `transformer/diffusion_pytorch_model.safetensors` (INT8-quantized weights)
41
+ - `text_encoder/*`, `vae/*`, `scheduler/*`, `tokenizer/*` (not quantized)
42
+ - `zimage_quanto_bench_results/*` (benchmark metrics and baseline-vs-int8 images)
43
+ - `test_outputs/*` (generated examples)
44
+
45
+ ## Installation
46
+
47
+ Python 3.10+ is recommended.
48
+
49
+ ```bash
50
+ # Create env (optional)
51
+ python -m venv .venv
52
+
53
+ # Windows
54
+ .venv\Scripts\activate
55
+
56
+ # Linux/macOS
57
+ # source .venv/bin/activate
58
+
59
+ python -m pip install --upgrade pip
60
+
61
+ # PyTorch (NVIDIA CUDA, example)
62
+ pip install torch --index-url https://download.pytorch.org/whl/cu128
63
+
64
+ # PyTorch (macOS / CPU-only example)
65
+ # pip install torch
66
+
67
+ # Inference dependencies
68
+ pip install diffusers transformers accelerate safetensors sentencepiece optimum-quanto pillow
69
+ ```
70
+
71
+ ## Quick Start (Diffusers)
72
+
73
+ This repo already stores quantized weights, so you do **not** need to re-run quantization during loading.
74
+
75
+ ```python
76
+ import torch
77
+ from diffusers import ZImagePipeline
78
+
79
+ model_id = "ixim/Z-Image-INT8"
80
+
81
+ device = "cuda" if torch.cuda.is_available() else "cpu"
82
+ dtype = torch.bfloat16 if device == "cuda" else torch.float32
83
+
84
+ pipe = ZImagePipeline.from_pretrained(
85
+ model_id,
86
+ torch_dtype=dtype,
87
+ low_cpu_mem_usage=True,
88
+ )
89
+
90
+ if device == "cuda":
91
+ pipe.enable_model_cpu_offload()
92
+ else:
93
+ pipe = pipe.to("cpu")
94
+
95
+ prompt = "A cinematic portrait of a young woman, soft lighting, high detail"
96
+ negative_prompt = "blurry, low quality, distorted face, extra limbs, artifacts"
97
+ generator = torch.Generator(device=device).manual_seed(42)
98
+
99
+ image = pipe(
100
+ prompt=prompt,
101
+ negative_prompt=negative_prompt,
102
+ height=1024,
103
+ width=1024,
104
+ num_inference_steps=28,
105
+ guidance_scale=4.0,
106
+ generator=generator,
107
+ ).images[0]
108
+
109
+ image.save("zimage_int8_sample.png")
110
+ print("Saved: zimage_int8_sample.png")
111
+ ```
112
+
113
+ ## Additional Generated Samples (INT8)
114
+
115
+ These two images are generated with this quantized model:
116
+
117
+ ### 1) `en_portrait_1024x1024.png`
118
+
119
+ - **Prompt**: `A cinematic portrait of a young woman standing by the window, golden hour sunlight, shallow depth of field, film grain, ultra-detailed skin texture, photorealistic`
120
+
121
+ <div align="center"><img src="test_outputs/en_portrait_1024x1024.png" width="512" /></div>
122
+
123
+ ### 2) `cn_scene_1024x1024.png`
124
+
125
+ - **Prompt**: `一只橘猫趴在堆满旧书的木桌上打盹,午后阳光透过窗帘洒进来,暖色调,胶片风格,细腻毛发纹理,超高清`
126
+
127
+ <div align="center"><img src="test_outputs/cn_scene_1024x1024.png" width="512" /></div>
128
+
129
+ ## Benchmark & Performance
130
+
131
+ Test environment:
132
+ - GPU: NVIDIA GeForce RTX 5090
133
+ - Framework: PyTorch 2.10.0+cu130
134
+ - Inference setting: 1024×1024, 28 steps, guidance=4.0, CPU offload enabled
135
+ - Cases: 4 prompts (`portrait_01`, `portrait_02`, `scene_01`, `night_01`)
136
+
137
+ ### Aggregate Comparison (Baseline vs INT8)
138
+
139
+ | Metric | Baseline | INT8 | Delta |
140
+ |---|---:|---:|---:|
141
+ | Avg elapsed / image (s) | 51.7766 | 39.5662 | **-23.6%** |
142
+ | Avg sec / step | 1.8492 | 1.4131 | **-23.6%** |
143
+ | Avg peak CUDA alloc (GB) | 12.5195 | 7.7470 | **-38.1%** |
144
+
145
+
146
+ > Results may vary across hardware, drivers, and PyTorch/CUDA versions.
147
+
148
+ ### Per-Case Results
149
+
150
+ | Case | Baseline (s) | INT8 (s) | Speedup |
151
+ |---|---:|---:|---:|
152
+ | portrait_01 | 99.9223 | 60.6768 | 1.65x |
153
+ | portrait_02 | 37.4116 | 32.8863 | 1.14x |
154
+ | scene_01 | 34.9946 | 32.2035 | 1.09x |
155
+ | night_01 | 34.7780 | 32.4981 | 1.07x |
156
+
157
+ ## Visual Comparison (Baseline vs INT8)
158
+
159
+ Left: Baseline. Right: INT8. (Same prompt/seed/steps.)
160
+
161
+ | Case | Baseline | INT8 |
162
+ |---|---|---|
163
+ | portrait_01 | ![](zimage_quanto_bench_results/images/baseline/portrait_01_seed46.png) | ![](zimage_quanto_bench_results/images/int8/portrait_01_seed46.png) |
164
+ | portrait_02 | ![](zimage_quanto_bench_results/images/baseline/portrait_02_seed123.png) | ![](zimage_quanto_bench_results/images/int8/portrait_02_seed123.png) |
165
+ | scene_01 | ![](zimage_quanto_bench_results/images/baseline/scene_01_seed777.png) | ![](zimage_quanto_bench_results/images/int8/scene_01_seed777.png) |
166
+ | night_01 | ![](zimage_quanto_bench_results/images/baseline/night_01_seed2026.png) | ![](zimage_quanto_bench_results/images/int8/night_01_seed2026.png) |
167
+
168
+ ## Limitations
169
+
170
+ - This is **weight-only INT8** quantization; activation precision is unchanged.
171
+ - Minor visual differences may appear on some prompts.
172
+ - `enable_model_cpu_offload()` can change latency distribution across pipeline stages.
173
+ - For extreme resolutions / very long step counts, validate quality and stability first.
174
+
175
+ ## Intended Use
176
+
177
+ Recommended for:
178
+ - Running Z-Image with lower VRAM usage.
179
+ - Improving throughput while keeping quality close to baseline.
180
+
181
+ Not recommended as-is for:
182
+ - Safety-critical decision workflows.
183
+ - High-risk generation use cases without additional review/guardrails.
184
+
185
+ ## Citation
186
+
187
+ If you use this model, please cite/reference the upstream model and toolchain:
188
+ - Tongyi-MAI/Z-Image
189
+ - Hugging Face Diffusers
190
+ - optimum-quanto