Files changed (1) hide show
  1. README.md +198 -184
README.md CHANGED
@@ -1,184 +1,198 @@
1
- ---
2
- license: apache-2.0
3
- library_name: videox_fun
4
- ---
5
-
6
- # Z-Image-Fun-Lora-Distill
7
-
8
- [![Github](https://img.shields.io/badge/🎬%20Code-VideoX_Fun-blue)](https://github.com/aigc-apps/VideoX-Fun)
9
-
10
-
11
- ## Model Card
12
-
13
- | Name | Description |
14
- |--|--|
15
- | Z-Image-Fun-Lora-Distill-8-Steps.safetensors | This is a Distill LoRA for Z-Image that distills both steps and CFG. This model does not require CFG and uses 8 steps for inference. |
16
-
17
- ## Model Features
18
- - This is a Distill LoRA for Z-Image that distills both steps and CFG. It does not use any Z-Image-Turbo related weights and is trained from scratch. It is compatible with other Z-Image LoRAs and [Controls](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1).
19
- - This model will slightly reduce the output quality and change the output composition of the model. For specific comparisons, please refer to the Results section. In most cases, the Distill LoRA performs well; currently, the biggest issue is that it may make the generated results brighter.
20
- - The purpose of this model is to provide fast generation compatibility for Z-Image derivative models, not to replace Z-Image-Turbo.
21
-
22
- ## TODO
23
- - Optimize the output brightness;
24
- - Train a 4-step LoRA.
25
-
26
- ## Results
27
- ### Work itself
28
-
29
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
30
- <tr>
31
- <td>Output 25 steps</td>
32
- <td>Output 8 steps</td>
33
- </tr>
34
- <tr>
35
- <td><img src="results/output1.png" width="100%" /></td>
36
- <td><img src="results/output1_8steps.png" width="100%" /></td>
37
- </tr>
38
- </table>
39
-
40
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
41
- <tr>
42
- <td>Output 25 steps</td>
43
- <td>Output 8 steps</td>
44
- </tr>
45
- <tr>
46
- <td><img src="results/output2.png" width="100%" /></td>
47
- <td><img src="results/output2_8steps.png" width="100%" /></td>
48
- </tr>
49
- </table>
50
-
51
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
52
- <tr>
53
- <td>Output 25 steps</td>
54
- <td>Output 8 steps</td>
55
- </tr>
56
- <tr>
57
- <td><img src="results/output3.png" width="100%" /></td>
58
- <td><img src="results/output3_8steps.png" width="100%" /></td>
59
- </tr>
60
- </table>
61
-
62
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
63
- <tr>
64
- <td>Output 25 steps</td>
65
- <td>Output 8 steps</td>
66
- </tr>
67
- <tr>
68
- <td><img src="results/output4.png" width="100%" /></td>
69
- <td><img src="results/output4_8steps.png" width="100%" /></td>
70
- </tr>
71
- </table>
72
-
73
- ### Work with Controlnet
74
-
75
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
76
- <tr>
77
- <td>Pose + Inpaint</td>
78
- <td>Output 25 steps</td>
79
- <td>Output 8 steps</td>
80
- </tr>
81
- <tr>
82
- <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
83
- <td><img src="results/inpaint.png" width="100%" /></td>
84
- <td><img src="results/inpaint_8steps.png" width="100%" /></td>
85
- </tr>
86
- </table>
87
-
88
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
89
- <tr>
90
- <td>Pose + Inpaint</td>
91
- <td>Output 25 steps</td>
92
- <td>Output 8 steps</td>
93
- </tr>
94
- <tr>
95
- <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /><img src="asset/pose.jpg" width="100%" /></td>
96
- <td><img src="results/pose_inpaint.png" width="100%" /></td>
97
- <td><img src="results/pose_inpaint_8steps.png" width="100%" /></td>
98
- </tr>
99
- </table>
100
-
101
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
102
- <tr>
103
- <td>Pose</td>
104
- <td>Output 25 steps</td>
105
- <td>Output 8 steps</td>
106
- </tr>
107
- <tr>
108
- <td><img src="asset/pose2.jpg" width="100%" /></td>
109
- <td><img src="results/pose2.png" width="100%" /></td>
110
- <td><img src="results/pose2_8steps.png" width="100%" /></td>
111
- </tr>
112
- </table>
113
-
114
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
115
- <tr>
116
- <td>Pose</td>
117
- <td>Output 25 steps</td>
118
- <td>Output 8 steps</td>
119
- </tr>
120
- <tr>
121
- <td><img src="asset/pose4.jpg" width="100%" /></td>
122
- <td><img src="results/pose4.png" width="100%" /></td>
123
- <td><img src="results/pose4_8steps.png" width="100%" /></td>
124
- </tr>
125
- </table>
126
-
127
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
128
- <tr>
129
- <td>Canny</td>
130
- <td>Output</td>
131
- <td>Output 8 steps</td>
132
- </tr>
133
- <tr>
134
- <td><img src="asset/canny.jpg" width="100%" /></td>
135
- <td><img src="results/canny.png" width="100%" /></td>
136
- <td><img src="results/canny_8steps.png" width="100%" /></td>
137
- </tr>
138
- </table>
139
-
140
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
141
- <tr>
142
- <td>Depth</td>
143
- <td>Output</td>
144
- <td>Output 8 steps</td>
145
- </tr>
146
- <tr>
147
- <td><img src="asset/gray.jpg" width="100%" /></td>
148
- <td><img src="results/gray.png" width="100%" /></td>
149
- <td><img src="results/gray_8steps.png" width="100%" /></td>
150
- </tr>
151
- </table>
152
-
153
- ## Inference
154
- Go to the VideoX-Fun repository for more details.
155
-
156
- Please clone the VideoX-Fun repository and create the required directories:
157
-
158
- ```sh
159
- # Clone the code
160
- git clone https://github.com/aigc-apps/VideoX-Fun.git
161
-
162
- # Enter VideoX-Fun's directory
163
- cd VideoX-Fun
164
-
165
- # Create model directories
166
- mkdir -p models/Diffusion_Transformer
167
- mkdir -p models/Personalized_Model
168
- ```
169
-
170
- Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
171
-
172
- ```
173
- πŸ“¦ models/
174
- β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
175
- β”‚ └── πŸ“‚ Z-Image/
176
- β”œβ”€β”€ πŸ“‚ Personalized_Model/
177
- β”‚ β”œβ”€β”€ πŸ“¦ Z-Image-Fun-Lora-Distill-8-Steps.safetensors
178
- β”‚ β”œβ”€β”€ πŸ“¦ Z-Image-Fun-Controlnet-Union-2.1.safetensors
179
- β”‚ └── πŸ“¦ Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors
180
- ```
181
-
182
- Set the lora_path="Personalized_Model/Z-Image-Fun-Lora-Distill-8-Steps.safetensors" in `examples/z_image_fun/predict_t2i_control_2.1.py` and `examples/z_image_fun/predict_i2i_inpaint_2.1.py`
183
-
184
- Then run the file `examples/z_image_fun/predict_t2i_control_2.1.py` and `examples/z_image_fun/predict_i2i_inpaint_2.1.py`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: videox_fun
4
+ ---
5
+
6
+ # Z-Image-Fun-Lora-Distill
7
+
8
+ [![Github](https://img.shields.io/badge/🎬%20Code-VideoX_Fun-blue)](https://github.com/aigc-apps/VideoX-Fun)
9
+
10
+
11
+ ## Model Card
12
+
13
+ | Name | Description |
14
+ |--|--|
15
+ | Z-Image-Fun-Lora-Distill-8-Steps.safetensors | This is a Distill LoRA for Z-Image that distills both steps and CFG. This model does not require CFG and uses 8 steps for inference. |
16
+
17
+ ## Model Features
18
+ - This is a Distill LoRA for Z-Image that distills both steps and CFG. It does not use any Z-Image-Turbo related weights and is trained from scratch. It is compatible with other Z-Image LoRAs and [Controls](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1).
19
+ - This model will slightly reduce the output quality and change the output composition of the model. For specific comparisons, please refer to the Results section. In most cases, the Distill LoRA performs well; currently, the biggest issue is that it may make the generated results brighter.
20
+ - The purpose of this model is to provide fast generation compatibility for Z-Image derivative models, not to replace Z-Image-Turbo.
21
+
22
+ ## TODO
23
+ - Optimize the output brightness;
24
+ - Train a 4-step LoRA.
25
+
26
+ ## Results
27
+ ### Work itself
28
+
29
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
30
+ <tr>
31
+ <td>Output 25 steps</td>
32
+ <td>Output 8 steps</td>
33
+ </tr>
34
+ <tr>
35
+ <td><img src="results/output1.png" width="100%" /></td>
36
+ <td><img src="results/output1_8steps.png" width="100%" /></td>
37
+ </tr>
38
+ </table>
39
+
40
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
41
+ <tr>
42
+ <td>Output 25 steps</td>
43
+ <td>Output 8 steps</td>
44
+ </tr>
45
+ <tr>
46
+ <td><img src="results/output2.png" width="100%" /></td>
47
+ <td><img src="results/output2_8steps.png" width="100%" /></td>
48
+ </tr>
49
+ </table>
50
+
51
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
52
+ <tr>
53
+ <td>Output 25 steps</td>
54
+ <td>Output 8 steps</td>
55
+ </tr>
56
+ <tr>
57
+ <td><img src="results/output3.png" width="100%" /></td>
58
+ <td><img src="results/output3_8steps.png" width="100%" /></td>
59
+ </tr>
60
+ </table>
61
+
62
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
63
+ <tr>
64
+ <td>Output 25 steps</td>
65
+ <td>Output 8 steps</td>
66
+ </tr>
67
+ <tr>
68
+ <td><img src="results/output4.png" width="100%" /></td>
69
+ <td><img src="results/output4_8steps.png" width="100%" /></td>
70
+ </tr>
71
+ </table>
72
+
73
+ ### Work with Controlnet
74
+
75
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
76
+ <tr>
77
+ <td>Pose + Inpaint</td>
78
+ <td>Output 25 steps</td>
79
+ <td>Output 8 steps</td>
80
+ </tr>
81
+ <tr>
82
+ <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
83
+ <td><img src="results/inpaint.png" width="100%" /></td>
84
+ <td><img src="results/inpaint_8steps.png" width="100%" /></td>
85
+ </tr>
86
+ </table>
87
+
88
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
89
+ <tr>
90
+ <td>Pose + Inpaint</td>
91
+ <td>Output 25 steps</td>
92
+ <td>Output 8 steps</td>
93
+ </tr>
94
+ <tr>
95
+ <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /><img src="asset/pose.jpg" width="100%" /></td>
96
+ <td><img src="results/pose_inpaint.png" width="100%" /></td>
97
+ <td><img src="results/pose_inpaint_8steps.png" width="100%" /></td>
98
+ </tr>
99
+ </table>
100
+
101
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
102
+ <tr>
103
+ <td>Pose</td>
104
+ <td>Output 25 steps</td>
105
+ <td>Output 8 steps</td>
106
+ </tr>
107
+ <tr>
108
+ <td><img src="asset/pose2.jpg" width="100%" /></td>
109
+ <td><img src="results/pose2.png" width="100%" /></td>
110
+ <td><img src="results/pose2_8steps.png" width="100%" /></td>
111
+ </tr>
112
+ </table>
113
+
114
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
115
+ <tr>
116
+ <td>Pose</td>
117
+ <td>Output 25 steps</td>
118
+ <td>Output 8 steps</td>
119
+ </tr>
120
+ <tr>
121
+ <td><img src="asset/pose4.jpg" width="100%" /></td>
122
+ <td><img src="results/pose4.png" width="100%" /></td>
123
+ <td><img src="results/pose4_8steps.png" width="100%" /></td>
124
+ </tr>
125
+ </table>
126
+
127
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
128
+ <tr>
129
+ <td>Canny</td>
130
+ <td>Output</td>
131
+ <td>Output 8 steps</td>
132
+ </tr>
133
+ <tr>
134
+ <td><img src="asset/canny.jpg" width="100%" /></td>
135
+ <td><img src="results/canny.png" width="100%" /></td>
136
+ <td><img src="results/canny_8steps.png" width="100%" /></td>
137
+ </tr>
138
+ </table>
139
+
140
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
141
+ <tr>
142
+ <td>Depth</td>
143
+ <td>Output</td>
144
+ <td>Output 8 steps</td>
145
+ </tr>
146
+ <tr>
147
+ <td><img src="asset/gray.jpg" width="100%" /></td>
148
+ <td><img src="results/gray.png" width="100%" /></td>
149
+ <td><img src="results/gray_8steps.png" width="100%" /></td>
150
+ </tr>
151
+ </table>
152
+
153
+ ## Inference
154
+ Go to the VideoX-Fun repository for more details.
155
+
156
+ Please clone the VideoX-Fun repository and create the required directories:
157
+
158
+ ```sh
159
+ # Clone the code
160
+ git clone https://github.com/aigc-apps/VideoX-Fun.git
161
+
162
+ # Enter VideoX-Fun's directory
163
+ cd VideoX-Fun
164
+
165
+ # Create model directories
166
+ mkdir -p models/Diffusion_Transformer
167
+ mkdir -p models/Personalized_Model
168
+ ```
169
+
170
+ Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
171
+
172
+ ```
173
+ πŸ“¦ models/
174
+ β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
175
+ β”‚ └── πŸ“‚ Z-Image/
176
+ β”œβ”€β”€ πŸ“‚ Personalized_Model/
177
+ β”‚ β”œβ”€β”€ πŸ“¦ Z-Image-Fun-Lora-Distill-8-Steps.safetensors
178
+ β”‚ β”œβ”€β”€ πŸ“¦ Z-Image-Fun-Controlnet-Union-2.1.safetensors
179
+ β”‚ └── πŸ“¦ Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors
180
+ ```
181
+
182
+
183
+ To run the model, **first** set the lora_path in `examples/z_image/predict_t2i.py` to:
184
+ `Personalized_Model/Z-Image-Fun-Lora-Distill-8-Steps.safetensors`
185
+
186
+
187
+ **Then**, run the file:
188
+ `examples/z_image/predict_t2i.py`
189
+
190
+ The following scripts are also supported:
191
+ - examples/z_image_fun/predict_t2i_control_2.1.py
192
+ - examples/z_image_fun/predict_i2i_inpaint_2.1.py
193
+
194
+
195
+ **Recommended Settings**:
196
+ - cfg = 1.0
197
+ - steps = 8
198
+ - lora_weight = 0.8 (suggested range: 0.7 ~ 0.8)