ixim commited on
Commit
32c18a5
·
verified ·
1 Parent(s): fb39ee7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +191 -190
README.md CHANGED
@@ -1,190 +1,191 @@
1
- ---
2
- language:
3
- - en
4
- license: other
5
- library_name: diffusers
6
- pipeline_tag: text-to-image
7
- tags:
8
- - text-to-image
9
- - diffusers
10
- - quanto
11
- - int8
12
- - z-image
13
- - transformer-quantization
14
- base_model:
15
- - Tongyi-MAI/Z-Image
16
- ---
17
-
18
- # Z-Image INT8 (Quanto)
19
-
20
- This repository provides an INT8-quantized variant of [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image):
21
- - **Only** the `transformer` is quantized with **Quanto weight-only INT8**.
22
- - `text_encoder`, `vae`, `scheduler`, and `tokenizer` remain unchanged.
23
- - Inference API stays compatible with `diffusers.ZImagePipeline`.
24
-
25
- > Please follow the original upstream model license and usage terms. `license: other` means this repo inherits upstream licensing constraints.
26
-
27
- ## Model Details
28
-
29
- - **Base model**: `Tongyi-MAI/Z-Image`
30
- - **Quantization method**: `optimum-quanto` (weight-only INT8)
31
- - **Quantized part**: `transformer`
32
- - **Compute dtype**: `bfloat16`
33
- - **Pipeline**: `diffusers.ZImagePipeline`
34
- - **Negative prompt support**: Yes (same pipeline API as the base model)
35
-
36
- ## Files
37
-
38
- Key files in this repository:
39
- - `model_index.json`
40
- - `transformer/diffusion_pytorch_model.safetensors` (INT8-quantized weights)
41
- - `text_encoder/*`, `vae/*`, `scheduler/*`, `tokenizer/*` (not quantized)
42
- - `zimage_quanto_bench_results/*` (benchmark metrics and baseline-vs-int8 images)
43
- - `test_outputs/*` (generated examples)
44
-
45
- ## Installation
46
-
47
- Python 3.10+ is recommended.
48
-
49
- ```bash
50
- # Create env (optional)
51
- python -m venv .venv
52
-
53
- # Windows
54
- .venv\Scripts\activate
55
-
56
- # Linux/macOS
57
- # source .venv/bin/activate
58
-
59
- python -m pip install --upgrade pip
60
-
61
- # PyTorch (NVIDIA CUDA, example)
62
- pip install torch --index-url https://download.pytorch.org/whl/cu128
63
-
64
- # PyTorch (macOS / CPU-only example)
65
- # pip install torch
66
-
67
- # Inference dependencies
68
- pip install diffusers transformers accelerate safetensors sentencepiece optimum-quanto pillow
69
- ```
70
-
71
- ## Quick Start (Diffusers)
72
-
73
- This repo already stores quantized weights, so you do **not** need to re-run quantization during loading.
74
-
75
- ```python
76
- import torch
77
- from diffusers import ZImagePipeline
78
-
79
- model_id = "ixim/Z-Image-INT8"
80
-
81
- device = "cuda" if torch.cuda.is_available() else "cpu"
82
- dtype = torch.bfloat16 if device == "cuda" else torch.float32
83
-
84
- pipe = ZImagePipeline.from_pretrained(
85
- model_id,
86
- torch_dtype=dtype,
87
- low_cpu_mem_usage=True,
88
- )
89
-
90
- if device == "cuda":
91
- pipe.enable_model_cpu_offload()
92
- else:
93
- pipe = pipe.to("cpu")
94
-
95
- prompt = "A cinematic portrait of a young woman, soft lighting, high detail"
96
- negative_prompt = "blurry, low quality, distorted face, extra limbs, artifacts"
97
- generator = torch.Generator(device=device).manual_seed(42)
98
-
99
- image = pipe(
100
- prompt=prompt,
101
- negative_prompt=negative_prompt,
102
- height=1024,
103
- width=1024,
104
- num_inference_steps=28,
105
- guidance_scale=4.0,
106
- generator=generator,
107
- ).images[0]
108
-
109
- image.save("zimage_int8_sample.png")
110
- print("Saved: zimage_int8_sample.png")
111
- ```
112
-
113
- ## Additional Generated Samples (INT8)
114
-
115
- These two images are generated with this quantized model:
116
-
117
- ### 1) `en_portrait_1024x1024.png`
118
-
119
- - **Prompt**: `A cinematic portrait of a young woman standing by the window, golden hour sunlight, shallow depth of field, film grain, ultra-detailed skin texture, photorealistic`
120
-
121
- <div align="center"><img src="test_outputs/en_portrait_1024x1024.png" width="512" /></div>
122
-
123
- ### 2) `cn_scene_1024x1024.png`
124
-
125
- - **Prompt**: `一只橘猫趴在堆满旧书的木桌上打盹,午后阳光透过窗帘洒进来,暖色调,胶片风格,细腻毛发纹理,超高清`
126
-
127
- <div align="center"><img src="test_outputs/cn_scene_1024x1024.png" width="512" /></div>
128
-
129
- ## Benchmark & Performance
130
-
131
- Test environment:
132
- - GPU: NVIDIA GeForce RTX 5090
133
- - Framework: PyTorch 2.10.0+cu130
134
- - Inference setting: 1024×1024, 28 steps, guidance=4.0, CPU offload enabled
135
- - Cases: 4 prompts (`portrait_01`, `portrait_02`, `scene_01`, `night_01`)
136
-
137
- ### Aggregate Comparison (Baseline vs INT8)
138
-
139
- | Metric | Baseline | INT8 | Delta |
140
- |---|---:|---:|---:|
141
- | Avg elapsed / image (s) | 51.7766 | 39.5662 | **-23.6%** |
142
- | Avg sec / step | 1.8492 | 1.4131 | **-23.6%** |
143
- | Avg peak CUDA alloc (GB) | 12.5195 | 7.7470 | **-38.1%** |
144
-
145
-
146
- > Results may vary across hardware, drivers, and PyTorch/CUDA versions.
147
-
148
- ### Per-Case Results
149
-
150
- | Case | Baseline (s) | INT8 (s) | Speedup |
151
- |---|---:|---:|---:|
152
- | portrait_01 | 99.9223 | 60.6768 | 1.65x |
153
- | portrait_02 | 37.4116 | 32.8863 | 1.14x |
154
- | scene_01 | 34.9946 | 32.2035 | 1.09x |
155
- | night_01 | 34.7780 | 32.4981 | 1.07x |
156
-
157
- ## Visual Comparison (Baseline vs INT8)
158
-
159
- Left: Baseline. Right: INT8. (Same prompt/seed/steps.)
160
-
161
- | Case | Baseline | INT8 |
162
- |---|---|---|
163
- | portrait_01 | ![](zimage_quanto_bench_results/images/baseline/portrait_01_seed46.png) | ![](zimage_quanto_bench_results/images/int8/portrait_01_seed46.png) |
164
- | portrait_02 | ![](zimage_quanto_bench_results/images/baseline/portrait_02_seed123.png) | ![](zimage_quanto_bench_results/images/int8/portrait_02_seed123.png) |
165
- | scene_01 | ![](zimage_quanto_bench_results/images/baseline/scene_01_seed777.png) | ![](zimage_quanto_bench_results/images/int8/scene_01_seed777.png) |
166
- | night_01 | ![](zimage_quanto_bench_results/images/baseline/night_01_seed2026.png) | ![](zimage_quanto_bench_results/images/int8/night_01_seed2026.png) |
167
-
168
- ## Limitations
169
-
170
- - This is **weight-only INT8** quantization; activation precision is unchanged.
171
- - Minor visual differences may appear on some prompts.
172
- - `enable_model_cpu_offload()` can change latency distribution across pipeline stages.
173
- - For extreme resolutions / very long step counts, validate quality and stability first.
174
-
175
- ## Intended Use
176
-
177
- Recommended for:
178
- - Running Z-Image with lower VRAM usage.
179
- - Improving throughput while keeping quality close to baseline.
180
-
181
- Not recommended as-is for:
182
- - Safety-critical decision workflows.
183
- - High-risk generation use cases without additional review/guardrails.
184
-
185
- ## Citation
186
-
187
- If you use this model, please cite/reference the upstream model and toolchain:
188
- - Tongyi-MAI/Z-Image
189
- - Hugging Face Diffusers
190
- - optimum-quanto
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: other
5
+ library_name: diffusers
6
+ pipeline_tag: text-to-image
7
+ tags:
8
+ - text-to-image
9
+ - diffusers
10
+ - quanto
11
+ - int8
12
+ - z-image
13
+ - transformer-quantization
14
+ base_model:
15
+ - Tongyi-MAI/Z-Image
16
+ base_model_relation: quantized
17
+ ---
18
+
19
+ # Z-Image INT8 (Quanto)
20
+
21
+ This repository provides an INT8-quantized variant of [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image):
22
+ - **Only** the `transformer` is quantized with **Quanto weight-only INT8**.
23
+ - `text_encoder`, `vae`, `scheduler`, and `tokenizer` remain unchanged.
24
+ - Inference API stays compatible with `diffusers.ZImagePipeline`.
25
+
26
+ > Please follow the original upstream model license and usage terms. `license: other` means this repo inherits upstream licensing constraints.
27
+
28
+ ## Model Details
29
+
30
+ - **Base model**: `Tongyi-MAI/Z-Image`
31
+ - **Quantization method**: `optimum-quanto` (weight-only INT8)
32
+ - **Quantized part**: `transformer`
33
+ - **Compute dtype**: `bfloat16`
34
+ - **Pipeline**: `diffusers.ZImagePipeline`
35
+ - **Negative prompt support**: Yes (same pipeline API as the base model)
36
+
37
+ ## Files
38
+
39
+ Key files in this repository:
40
+ - `model_index.json`
41
+ - `transformer/diffusion_pytorch_model.safetensors` (INT8-quantized weights)
42
+ - `text_encoder/*`, `vae/*`, `scheduler/*`, `tokenizer/*` (not quantized)
43
+ - `zimage_quanto_bench_results/*` (benchmark metrics and baseline-vs-int8 images)
44
+ - `test_outputs/*` (generated examples)
45
+
46
+ ## Installation
47
+
48
+ Python 3.10+ is recommended.
49
+
50
+ ```bash
51
+ # Create env (optional)
52
+ python -m venv .venv
53
+
54
+ # Windows
55
+ .venv\Scripts\activate
56
+
57
+ # Linux/macOS
58
+ # source .venv/bin/activate
59
+
60
+ python -m pip install --upgrade pip
61
+
62
+ # PyTorch (NVIDIA CUDA, example)
63
+ pip install torch --index-url https://download.pytorch.org/whl/cu128
64
+
65
+ # PyTorch (macOS / CPU-only example)
66
+ # pip install torch
67
+
68
+ # Inference dependencies
69
+ pip install diffusers transformers accelerate safetensors sentencepiece optimum-quanto pillow
70
+ ```
71
+
72
+ ## Quick Start (Diffusers)
73
+
74
+ This repo already stores quantized weights, so you do **not** need to re-run quantization during loading.
75
+
76
+ ```python
77
+ import torch
78
+ from diffusers import ZImagePipeline
79
+
80
+ model_id = "ixim/Z-Image-INT8"
81
+
82
+ device = "cuda" if torch.cuda.is_available() else "cpu"
83
+ dtype = torch.bfloat16 if device == "cuda" else torch.float32
84
+
85
+ pipe = ZImagePipeline.from_pretrained(
86
+ model_id,
87
+ torch_dtype=dtype,
88
+ low_cpu_mem_usage=True,
89
+ )
90
+
91
+ if device == "cuda":
92
+ pipe.enable_model_cpu_offload()
93
+ else:
94
+ pipe = pipe.to("cpu")
95
+
96
+ prompt = "A cinematic portrait of a young woman, soft lighting, high detail"
97
+ negative_prompt = "blurry, low quality, distorted face, extra limbs, artifacts"
98
+ generator = torch.Generator(device=device).manual_seed(42)
99
+
100
+ image = pipe(
101
+ prompt=prompt,
102
+ negative_prompt=negative_prompt,
103
+ height=1024,
104
+ width=1024,
105
+ num_inference_steps=28,
106
+ guidance_scale=4.0,
107
+ generator=generator,
108
+ ).images[0]
109
+
110
+ image.save("zimage_int8_sample.png")
111
+ print("Saved: zimage_int8_sample.png")
112
+ ```
113
+
114
+ ## Additional Generated Samples (INT8)
115
+
116
+ These two images are generated with this quantized model:
117
+
118
+ ### 1) `en_portrait_1024x1024.png`
119
+
120
+ - **Prompt**: `A cinematic portrait of a young woman standing by the window, golden hour sunlight, shallow depth of field, film grain, ultra-detailed skin texture, photorealistic`
121
+
122
+ <div align="center"><img src="test_outputs/en_portrait_1024x1024.png" width="512" /></div>
123
+
124
+ ### 2) `cn_scene_1024x1024.png`
125
+
126
+ - **Prompt**: `一只橘猫趴在堆满旧书的木桌上打盹,午后阳光透过窗帘洒进来,暖色调,胶片风格,细腻毛发纹理,超高清`
127
+
128
+ <div align="center"><img src="test_outputs/cn_scene_1024x1024.png" width="512" /></div>
129
+
130
+ ## Benchmark & Performance
131
+
132
+ Test environment:
133
+ - GPU: NVIDIA GeForce RTX 5090
134
+ - Framework: PyTorch 2.10.0+cu130
135
+ - Inference setting: 1024×1024, 28 steps, guidance=4.0, CPU offload enabled
136
+ - Cases: 4 prompts (`portrait_01`, `portrait_02`, `scene_01`, `night_01`)
137
+
138
+ ### Aggregate Comparison (Baseline vs INT8)
139
+
140
+ | Metric | Baseline | INT8 | Delta |
141
+ |---|---:|---:|---:|
142
+ | Avg elapsed / image (s) | 51.7766 | 39.5662 | **-23.6%** |
143
+ | Avg sec / step | 1.8492 | 1.4131 | **-23.6%** |
144
+ | Avg peak CUDA alloc (GB) | 12.5195 | 7.7470 | **-38.1%** |
145
+
146
+
147
+ > Results may vary across hardware, drivers, and PyTorch/CUDA versions.
148
+
149
+ ### Per-Case Results
150
+
151
+ | Case | Baseline (s) | INT8 (s) | Speedup |
152
+ |---|---:|---:|---:|
153
+ | portrait_01 | 99.9223 | 60.6768 | 1.65x |
154
+ | portrait_02 | 37.4116 | 32.8863 | 1.14x |
155
+ | scene_01 | 34.9946 | 32.2035 | 1.09x |
156
+ | night_01 | 34.7780 | 32.4981 | 1.07x |
157
+
158
+ ## Visual Comparison (Baseline vs INT8)
159
+
160
+ Left: Baseline. Right: INT8. (Same prompt/seed/steps.)
161
+
162
+ | Case | Baseline | INT8 |
163
+ |---|---|---|
164
+ | portrait_01 | ![](zimage_quanto_bench_results/images/baseline/portrait_01_seed46.png) | ![](zimage_quanto_bench_results/images/int8/portrait_01_seed46.png) |
165
+ | portrait_02 | ![](zimage_quanto_bench_results/images/baseline/portrait_02_seed123.png) | ![](zimage_quanto_bench_results/images/int8/portrait_02_seed123.png) |
166
+ | scene_01 | ![](zimage_quanto_bench_results/images/baseline/scene_01_seed777.png) | ![](zimage_quanto_bench_results/images/int8/scene_01_seed777.png) |
167
+ | night_01 | ![](zimage_quanto_bench_results/images/baseline/night_01_seed2026.png) | ![](zimage_quanto_bench_results/images/int8/night_01_seed2026.png) |
168
+
169
+ ## Limitations
170
+
171
+ - This is **weight-only INT8** quantization; activation precision is unchanged.
172
+ - Minor visual differences may appear on some prompts.
173
+ - `enable_model_cpu_offload()` can change latency distribution across pipeline stages.
174
+ - For extreme resolutions / very long step counts, validate quality and stability first.
175
+
176
+ ## Intended Use
177
+
178
+ Recommended for:
179
+ - Running Z-Image with lower VRAM usage.
180
+ - Improving throughput while keeping quality close to baseline.
181
+
182
+ Not recommended as-is for:
183
+ - Safety-critical decision workflows.
184
+ - High-risk generation use cases without additional review/guardrails.
185
+
186
+ ## Citation
187
+
188
+ If you use this model, please cite/reference the upstream model and toolchain:
189
+ - Tongyi-MAI/Z-Image
190
+ - Hugging Face Diffusers
191
+ - optimum-quanto