uqer1244 commited on
Commit
fa9c57f
·
verified ·
1 Parent(s): a4e34a9

Upload full MLX-4bit pipeline (Auto-packaged)

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,163 +1,31 @@
1
  ---
2
  library_name: mlx
 
 
 
 
 
 
3
  license: apache-2.0
4
- base_model:
5
- - Tongyi-MAI/Z-Image-Turbo
6
  ---
7
 
8
  # uqer1244/MLX-z-image
9
 
10
- This is a 4-bit quantized version of Z-Image-Turbo for MLX.
 
11
 
12
- https://github.com/uqer1244/MLX_z-image
 
 
 
 
 
13
 
 
 
 
14
 
15
- # MLX z-image 🍎
16
-
17
- An efficient **MLX implementation** of [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) optimized for Apple Silicon (M1/M2/M3/M4).
18
-
19
- This repository allows you to run high-quality image generation locally on your Mac using **4-bit quantization**, significantly reducing memory usage while maintaining quality.
20
-
21
-
22
- ## 📂 Project Structure
23
-
24
- It is recommended to organize your folders as follows:
25
-
26
- ```text
27
- MLX_z-image/
28
- ├── Z-Image-Turbo-MLX-4bit/ # MLX Quantized Weights
29
- ├── Z-Image-Turbo/ # Original PyTorch Model
30
- ├── run.py # Inference Script
31
- ├── mlx_z_image.py # MLX converted transformer
32
- ├── convert.py # Script to convert PyTorch weights to MLX
33
- └── quantize.py # Script to quantize FP16 model to 4-bit
34
- ````
35
-
36
-
37
- ## 📊 Performance & Gallery
38
-
39
- ### Benchmarks
40
- Inference tests were conducted on Apple Silicon devices using **native MLX** with **4-bit quantization**.
41
-
42
- - **Resolution**: 1024x1024
43
- - **Steps**: 5 (Turbo)
44
- - **Batch Size**: 1
45
-
46
- | Device | RAM | Total Time | Denoise Time (MLX) | Time per Step |
47
- |:-----------|:-----|:-----------|:-------------------|:--------------|
48
- | **M3 Pro** | 18GB | ~ 182.17 s | 95.48 s | 19.097 s/Step |
49
-
50
-
51
-
52
-
53
-
54
- ## Installation
55
-
56
- ### 1\. Clone the repository
57
-
58
- ```bash
59
- git clone https://github.com/uqer1244/MLX_z-image.git
60
- cd MLX_z-image
61
- ```
62
-
63
- ### 2\. Install dependencies
64
-
65
- Make sure you have Python installed.
66
-
67
- ```bash
68
- pip install -r requirements.txt
69
- ```
70
-
71
- -----
72
-
73
- ## Quick Start
74
-
75
- To run the model, you need two things:
76
-
77
- 1. **Z-Image-Turbo-MLX-4bit**: The converted transformer model (4bit_quantized).
78
- 2. **Z-Image-Turbo**: The original VAE, Text Encoder, and Scheduler.
79
-
80
- ### 1\. Download MLX Weights (Quantized)
81
-
82
- Download the 4-bit converted weights from [uqer1244/MLX-z-image](https://huggingface.co/uqer1244/MLX-z-image).
83
-
84
- ```bash
85
- # Install CLI if needed
86
- pip install huggingface_hub
87
-
88
- # Download to 'Z-Image-Turbo-MLX-4bit' folder
89
- huggingface-cli download uqer1244/MLX-z-image --local-dir Z-Image-Turbo-MLX-4bit
90
- ```
91
-
92
- ### 2\. Download Base Model (Original)
93
-
94
- Download the original [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) to use its VAE and Text Encoder.
95
-
96
- ```bash
97
- # Download to 'Z-Image-Turbo' folder
98
- huggingface-cli download Tongyi-MAI/Z-Image-Turbo --local-dir Z-Image-Turbo
99
- ```
100
-
101
- ### 3\. Run Inference
102
-
103
- Run the `run.py` script. Make sure to point to both model paths.
104
-
105
- ```bash
106
- python run.py \
107
- --prompt "A futuristic city with flying cars, cinematic lighting, 8k" \
108
- --mlx_model_path "Z-Image-Turbo-MLX-4bit" \
109
- --pt_model_id "Z-Image-Turbo" \
110
- --output "res.png" \
111
- --steps 5
112
- ```
113
-
114
- ### Options
115
-
116
- | Argument | Description | Default |
117
- | :--- | :--- |:-------------------------|
118
- | `--prompt` | Text prompt for generation | *Astronaut...* |
119
- | `--mlx_model_path` | Path to the MLX weights folder | `Z-Image-Turbo-MLX-4bit` |
120
- | `--pt_model_id` | Path to the Base Model (or HF ID) | `Z-Image-Turbo` |
121
- | `--output` | Output filename | `res.png` |
122
- | `--steps` | Number of inference steps | `5` |
123
- | `--height` | Image height | `1024` |
124
- | `--width` | Image width | `1024` |
125
-
126
- -----
127
-
128
- ## Todo & Roadmap
129
-
130
- We are actively working on making this implementation pure MLX and bug-free.
131
-
132
- - [ ] **Fix Artifacts**: Investigate and resolve visual artifacts (tiling/color issues) currently visible in some generations.
133
- - [ ] **Full MLX Pipeline**: Port the remaining PyTorch components (VAE, Text Encoder, Tokenizer, Scheduler) to native MLX to remove the `torch` and `diffusers` dependencies completely.
134
- - [ ] **LoRA Support**: Add support for loading and applying LoRA adapters for style customization.
135
-
136
- -----
137
-
138
- ## Advanced: Manual Conversion
139
-
140
- If you want to convert the original PyTorch weights yourself (instead of downloading the pre-converted ones), follow these steps.
141
-
142
- **1. Convert PyTorch to MLX (FP16)**
143
-
144
- ```bash
145
- python convert.py \
146
- --model_id "Z-Image-Turbo" \
147
- --dest_path "Z-Image-Turbo-MLX"
148
- ```
149
-
150
- **2. Quantize to 4-bit**
151
-
152
- ```bash
153
- python quantize.py \
154
- --model_path "Z-Image-Turbo-MLX" \
155
- --dest_path "Z-Image-Turbo-MLX-4bit" \
156
- --group_size 32
157
- ```
158
-
159
- ## Acknowledgements
160
-
161
- - Original Model: [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
162
- - MLX Framework: [Apple Machine Learning Research](https://github.com/ml-explore/mlx)
163
-
 
1
  ---
2
  library_name: mlx
3
+ tags:
4
+ - mlx
5
+ - diffusion
6
+ - text-to-image
7
+ - 4-bit
8
+ base_model: Tongyi-MAI/Z-Image-Turbo
9
  license: apache-2.0
 
 
10
  ---
11
 
12
  # uqer1244/MLX-z-image
13
 
14
+ This is a **4-bit quantized MLX version** of [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
15
+ It is optimized for Apple Silicon (macOS) using the [MLX framework](https://github.com/ml-explore/mlx).
16
 
17
+ ## Model Details
18
+ - **Transformer**: MLX 4-bit quantized
19
+ - **Text Encoder**: MLX 4-bit quantized (Qwen2)
20
+ - **VAE**: Original PyTorch Model (Sourced from original repo)
21
+ - **Tokenizer**: Original Qwen2 Tokenizer (Sourced from original repo)
22
+ - **Scheduler**: FlowMatchEulerDiscreteScheduler (Sourced from original repo)
23
 
24
+ ## Usage
25
+ This model can be used with the custom MLX pipeline script.
26
+ Please refer to the original repository for detailed usage instructions regarding the model architecture.
27
 
28
+ ## Attribution & License
29
+ This model is a derivative work of [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
30
+ - **Original License**: Apache 2.0
31
+ - **Modifications**: Converted Transformer and Text Encoder weights to MLX format and quantized to 4-bit.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model_index.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ZImagePipelineMLX",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "scheduler": [
5
+ "diffusers",
6
+ "FlowMatchEulerDiscreteScheduler"
7
+ ],
8
+ "text_encoder": [
9
+ "transformers",
10
+ "Qwen3Model"
11
+ ],
12
+ "tokenizer": [
13
+ "transformers",
14
+ "Qwen2Tokenizer"
15
+ ],
16
+ "transformer": [
17
+ "diffusers",
18
+ "ZImageTransformer2DModel"
19
+ ],
20
+ "vae": [
21
+ "diffusers",
22
+ "AutoencoderKL"
23
+ ]
24
+ }
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "FlowMatchEulerDiscreteScheduler",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "num_train_timesteps": 1000,
5
+ "use_dynamic_shifting": false,
6
+ "shift": 3.0
7
+ }
text_encoder/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "eos_token_id": 151645,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2560,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 9728,
14
+ "max_position_embeddings": 40960,
15
+ "max_window_layers": 36,
16
+ "model_type": "qwen3",
17
+ "num_attention_heads": 32,
18
+ "num_hidden_layers": 36,
19
+ "num_key_value_heads": 8,
20
+ "rms_norm_eps": 1e-06,
21
+ "rope_scaling": null,
22
+ "rope_theta": 1000000,
23
+ "sliding_window": null,
24
+ "tie_word_embeddings": true,
25
+ "torch_dtype": "bfloat16",
26
+ "transformers_version": "4.51.0",
27
+ "use_cache": true,
28
+ "use_sliding_window": false,
29
+ "vocab_size": 151936
30
+ }
text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82a08bd3360bf23b47db61b7b758b46401f04c3f3533d97af9f370df81239fa4
3
+ size 3017590306
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
231
+ "clean_up_tokenization_spaces": false,
232
+ "eos_token": "<|im_end|>",
233
+ "errors": "replace",
234
+ "model_max_length": 131072,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
transformer/config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ZImageTransformer2DModel",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "all_f_patch_size": [
5
+ 1
6
+ ],
7
+ "all_patch_size": [
8
+ 2
9
+ ],
10
+ "axes_dims": [
11
+ 32,
12
+ 48,
13
+ 48
14
+ ],
15
+ "axes_lens": [
16
+ 1536,
17
+ 512,
18
+ 512
19
+ ],
20
+ "cap_feat_dim": 2560,
21
+ "dim": 3840,
22
+ "in_channels": 16,
23
+ "n_heads": 30,
24
+ "n_kv_heads": 30,
25
+ "n_layers": 30,
26
+ "n_refiner_layers": 2,
27
+ "norm_eps": 1e-05,
28
+ "qk_norm": true,
29
+ "rope_theta": 256.0,
30
+ "t_scale": 1000.0,
31
+ "nheads": 30
32
+ }
transformer/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c574bbc33780e5a0805469d2825cd8991741fd880199273a4df38e960004c7b4
3
+ size 3848371009
vae/config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "_name_or_path": "flux-dev",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "force_upcast": true,
19
+ "in_channels": 3,
20
+ "latent_channels": 16,
21
+ "latents_mean": null,
22
+ "latents_std": null,
23
+ "layers_per_block": 2,
24
+ "mid_block_add_attention": true,
25
+ "norm_num_groups": 32,
26
+ "out_channels": 3,
27
+ "sample_size": 1024,
28
+ "scaling_factor": 0.3611,
29
+ "shift_factor": 0.1159,
30
+ "up_block_types": [
31
+ "UpDecoderBlock2D",
32
+ "UpDecoderBlock2D",
33
+ "UpDecoderBlock2D",
34
+ "UpDecoderBlock2D"
35
+ ],
36
+ "use_post_quant_conv": false,
37
+ "use_quant_conv": false
38
+ }
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5b59a26851551b67ae1fe58d32e76486e1e812def4696a4bea97f16604d40a3
3
+ size 167666902