File size: 10,979 Bytes
7e18c64 439b44b 7e18c64 22aaae4 f826d58 22aaae4 f826d58 7e18c64 b16ccfc 7e18c64 b16ccfc 7e18c64 f826d58 89f86ce f826d58 b16ccfc 7e18c64 b6314fc 7e18c64 b16ccfc 7e18c64 b6314fc 7e18c64 b16ccfc 7e18c64 b6314fc 7e18c64 b16ccfc 7e18c64 b6314fc 7e18c64 b16ccfc 7e18c64 b6314fc 7e18c64 b16ccfc 7e18c64 b6314fc 7e18c64 b16ccfc 7e18c64 b6314fc 7e18c64 b16ccfc 7e18c64 b6314fc 7e18c64 b16ccfc 7e18c64 b6314fc 7e18c64 b16ccfc 7e18c64 ccf420d 7e18c64 83df46b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 | ---
license: apache-2.0
library_name: videox_fun
pipeline_tag: text-to-image
tags:
- lora
---
# Z-Image-Fun-Lora-Distill
[](https://github.com/aigc-apps/VideoX-Fun)
## Model Card
### a. 2603 Models
| Name | Description |
|--|--|
| Z-Image-Fun-Lora-Distill-2-Steps-2603.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. It requires only 2 steps instead of 8. Due to the random timesteps strategy, it is better adapted to sigmas below 0.500. The recommended sigma for the second step is between 0.800 and 0.500. A larger LoRA strength is recommended. |
| Z-Image-Fun-Lora-Distill-2-Steps-2603-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-2-Steps-2603.safetensors |
| Z-Image-Fun-Lora-Distill-4-Steps-2603.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. It requires only 4 steps instead of 8 steps. Due to the addition of a random timesteps strategy, it is better adapted to cases where sigmas are less than 0.500. |
| Z-Image-Fun-Lora-Distill-4-Steps-2603-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-4-Steps-2603.safetensors |
| Z-Image-Fun-Lora-Distill-8-Steps-2603.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. Compared to Z-Image-Fun-Lora-Distill-8-Steps-2602.safetensors, due to the addition of a random timesteps strategy, it is better adapted to cases where sigmas are less than 0.500. |
| Z-Image-Fun-Lora-Distill-8-Steps-2603-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-8-Steps-2603.safetensors |
### b. 2602 Models && Models Before 2602
| Name | Description |
|--|--|
| Z-Image-Fun-Lora-Distill-4-Steps-2602.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. Compared to Z-Image-Fun-Lora-Distill-8-Steps.safetensors, it requires only 4 steps instead of 8 steps, its colors are more consistent with the original model, and the skin texture is better. |
| Z-Image-Fun-Lora-Distill-4-Steps-2602-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-4-Steps-2602.safetensors |
| Z-Image-Fun-Lora-Distill-8-Steps-2602.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. Compared to Z-Image-Fun-Lora-Distill-8-Steps.safetensors, its colors are more consistent with the original model, and the skin texture is better. |
| Z-Image-Fun-Lora-Distill-8-Steps-2602-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-8-Steps-2602.safetensors |
| Z-Image-Fun-Lora-Distill-8-Steps.safetensors | This is a Distill LoRA for Z-Image that distills both steps and CFG. This model does not require CFG and uses 8 steps for inference. |
## Model Features
- This is a Distill LoRA for Z-Image that distills both steps and CFG. It does not use any Z-Image-Turbo related weights and is trained from scratch. It is compatible with other Z-Image LoRAs and [Controls](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1).
- This model will slightly reduce the output quality and change the output composition of the model. For specific comparisons, please refer to the Results section.
- The purpose of this model is to provide fast generation compatibility for Z-Image derivative models, not to replace Z-Image-Turbo.
## Results
### The difference between the 2603 version model and the 2602 version model
The 2602 model tends to produce blurry images with sigmas below 0.500, as the distillation model was not trained on certain steps. The 2603 model introduces a random timesteps strategy, making it better adapted to sigmas below 0.500.
As shown below, when using kl_optimal, many sigmas fall below 0.500. The 2603 model handles these cases correctly, while the 2602 model does not. Note that although kl_optimal is used in the figure, we still recommend using the simple scheduler for inference.
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Z-Image-Fun-Lora-Distill-8-Steps-2602</td>
<td>Z-Image-Fun-Lora-Distill-8-Steps-2603</td>
</tr>
<tr>
<td><img src="results/2602.png" width="100%" /></td>
<td><img src="results/2603.png" width="100%" /></td>
</tr>
</table>
### The difference between the 2602 version model and the previous model
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Z-Image-Fun-Lora-Distill-8-Steps-2602</td>
<td>Z-Image-Fun-Lora-Distill-4-Steps-2602</td>
<td>Z-Image-Fun-Lora-Distill-8-Steps</td>
</tr>
<tr>
<td><img src="results/2602_1_8steps.png" width="100%" /><img src="results/2602_2_8steps.png" width="100%" /><img src="results/2602_3_8steps.png" width="100%" /><img src="results/2602_4_8steps.png" width="100%" /><img src="results/2602_5_8steps.png" width="100%" /></td>
<td><img src="results/2602_1_4steps.png" width="100%" /><img src="results/2602_2_4steps.png" width="100%" /><img src="results/2602_3_4steps.png" width="100%" /><img src="results/2602_4_4steps.png" width="100%" /><img src="results/2602_5_4steps.png" width="100%" /></td>
<td><img src="results/old_1_8steps.png" width="100%" /><img src="results/old_2_8steps.png" width="100%" /><img src="results/old_3_8steps.png" width="100%" /><img src="results/old_4_8steps.png" width="100%" /><img src="results/old_5_8steps.png" width="100%" /></td>
</tr>
</table>
### Work itself
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Output 25 steps</td>
<td>Output 8-Steps-2602</td>
<td>Output 4-Steps-2602</td>
</tr>
<tr>
<td><img src="results/output4.png" width="100%" /></td>
<td><img src="results/output4_2602_8steps.png" width="100%" /></td>
<td><img src="results/output4_2602_4steps.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Output 25 steps</td>
<td>Output 8-Steps-2602</td>
<td>Output 4-Steps-2602</td>
</tr>
<tr>
<td><img src="results/output1.png" width="100%" /></td>
<td><img src="results/output1_2602_8steps.png" width="100%" /></td>
<td><img src="results/output1_2602_4steps.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Output 25 steps</td>
<td>Output 8-Steps-2602</td>
<td>Output 4-Steps-2602</td>
</tr>
<tr>
<td><img src="results/output2.png" width="100%" /></td>
<td><img src="results/output2_2602_8steps.png" width="100%" /></td>
<td><img src="results/output2_2602_4steps.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Output 25 steps</td>
<td>Output 8-Steps-2602</td>
<td>Output 4-Steps-2602</td>
</tr>
<tr>
<td><img src="results/output3.png" width="100%" /></td>
<td><img src="results/output3_2602_8steps.png" width="100%" /></td>
<td><img src="results/output3_2602_4steps.png" width="100%" /></td>
</tr>
</table>
### Work with Controlnet
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Pose + Inpaint</td>
<td>Output 25 steps</td>
<td>Output 8-Steps-2602</td>
<td>Output 4-Steps-2602</td>
</tr>
<tr>
<td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
<td><img src="results/inpaint.png" width="100%" /></td>
<td><img src="results/inpaint_2602_8steps.png" width="100%" /></td>
<td><img src="results/inpaint_2602_4steps.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Pose + Inpaint</td>
<td>Output 25 steps</td>
<td>Output 8-Steps-2602</td>
<td>Output 4-Steps-2602</td>
</tr>
<tr>
<td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /><img src="asset/pose.jpg" width="100%" /></td>
<td><img src="results/pose_inpaint.png" width="100%" /></td>
<td><img src="results/pose_inpaint_2602_8steps.png" width="100%" /></td>
<td><img src="results/pose_inpaint_2602_4steps.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Pose</td>
<td>Output 25 steps</td>
<td>Output 8-Steps-2602</td>
<td>Output 4-Steps-2602</td>
</tr>
<tr>
<td><img src="asset/pose2.jpg" width="100%" /></td>
<td><img src="results/pose2.png" width="100%" /></td>
<td><img src="results/pose2_2602_8steps.png" width="100%" /></td>
<td><img src="results/pose2_2602_4steps.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Canny</td>
<td>Output</td>
<td>Output 8-Steps-2602</td>
<td>Output 4-Steps-2602</td>
</tr>
<tr>
<td><img src="asset/canny.jpg" width="100%" /></td>
<td><img src="results/canny.png" width="100%" /></td>
<td><img src="results/canny_2602_8steps.png" width="100%" /></td>
<td><img src="results/canny_2602_4steps.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Depth</td>
<td>Output</td>
<td>Output 8-Steps-2602</td>
<td>Output 4-Steps-2602</td>
</tr>
<tr>
<td><img src="asset/gray.jpg" width="100%" /></td>
<td><img src="results/gray.png" width="100%" /></td>
<td><img src="results/gray_2602_8steps.png" width="100%" /></td>
<td><img src="results/gray_2602_4steps.png" width="100%" /></td>
</tr>
</table>
## Inference
Go to the VideoX-Fun repository for more details.
Please clone the VideoX-Fun repository and create the required directories:
```sh
# Clone the code
git clone https://github.com/aigc-apps/VideoX-Fun.git
# Enter VideoX-Fun's directory
cd VideoX-Fun
# Create model directories
mkdir -p models/Diffusion_Transformer
mkdir -p models/Personalized_Model
```
Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
```
π¦ models/
βββ π Diffusion_Transformer/
β βββ π Z-Image/
βββ π Personalized_Model/
β βββ π¦ Z-Image-Fun-Lora-Distill-4-Steps-2602.safetensors
β βββ π¦ Z-Image-Fun-Lora-Distill-8-Steps-2602.safetensors
β βββ π¦ Z-Image-Fun-Controlnet-Union-2.1.safetensors
β βββ π¦ Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors
```
To run the model, **first** set the lora_path in `examples/z_image/predict_t2i.py` to:
`Personalized_Model/Z-Image-Fun-Lora-Distill-8-Steps.safetensors`
**Then**, run the file:
`examples/z_image/predict_t2i.py`
The following scripts are also supported:
- examples/z_image_fun/predict_t2i_control_2.1.py
- examples/z_image_fun/predict_i2i_inpaint_2.1.py
**Recommended Settings**:
- cfg = 1.0
- steps = 8
- lora_weight = 0.8 (suggested range: 0.7 ~ 0.9)
|