| | --- |
| | license: apache-2.0 |
| | library_name: videox_fun |
| | pipeline_tag: text-to-image |
| | tags: |
| | - lora |
| | --- |
| | |
| | # Z-Image-Fun-Lora-Distill |
| |
|
| | [](https://github.com/aigc-apps/VideoX-Fun) |
| |
|
| |
|
| | ## Model Card |
| |
|
| | ### a. 2603 Models |
| |
|
| | | Name | Description | |
| | |--|--| |
| | | Z-Image-Fun-Lora-Distill-2-Steps-2603.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. It requires only 2 steps instead of 8. Due to the random timesteps strategy, it is better adapted to sigmas below 0.500. The recommended sigma for the second step is between 0.800 and 0.500. A larger LoRA strength is recommended. | |
| | | Z-Image-Fun-Lora-Distill-2-Steps-2603-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-2-Steps-2603.safetensors | |
| | | Z-Image-Fun-Lora-Distill-4-Steps-2603.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. It requires only 4 steps instead of 8 steps. Due to the addition of a random timesteps strategy, it is better adapted to cases where sigmas are less than 0.500. | |
| | | Z-Image-Fun-Lora-Distill-4-Steps-2603-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-4-Steps-2603.safetensors | |
| | | Z-Image-Fun-Lora-Distill-8-Steps-2603.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. Compared to Z-Image-Fun-Lora-Distill-8-Steps-2602.safetensors, due to the addition of a random timesteps strategy, it is better adapted to cases where sigmas are less than 0.500. | |
| | | Z-Image-Fun-Lora-Distill-8-Steps-2603-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-8-Steps-2603.safetensors | |
| |
|
| | ### b. 2602 Models && Models Before 2602 |
| |
|
| | | Name | Description | |
| | |--|--| |
| | | Z-Image-Fun-Lora-Distill-4-Steps-2602.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. Compared to Z-Image-Fun-Lora-Distill-8-Steps.safetensors, it requires only 4 steps instead of 8 steps, its colors are more consistent with the original model, and the skin texture is better. | |
| | | Z-Image-Fun-Lora-Distill-4-Steps-2602-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-4-Steps-2602.safetensors | |
| | | Z-Image-Fun-Lora-Distill-8-Steps-2602.safetensors | A Distill LoRA for Z-Image that distills both steps and CFG. Compared to Z-Image-Fun-Lora-Distill-8-Steps.safetensors, its colors are more consistent with the original model, and the skin texture is better. | |
| | | Z-Image-Fun-Lora-Distill-8-Steps-2602-ComfyUI.safetensors | ComfyUI version of Z-Image-Fun-Lora-Distill-8-Steps-2602.safetensors | |
| | | Z-Image-Fun-Lora-Distill-8-Steps.safetensors | This is a Distill LoRA for Z-Image that distills both steps and CFG. This model does not require CFG and uses 8 steps for inference. | |
| |
|
| | ## Model Features |
| | - This is a Distill LoRA for Z-Image that distills both steps and CFG. It does not use any Z-Image-Turbo related weights and is trained from scratch. It is compatible with other Z-Image LoRAs and [Controls](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1). |
| | - This model will slightly reduce the output quality and change the output composition of the model. For specific comparisons, please refer to the Results section. |
| | - The purpose of this model is to provide fast generation compatibility for Z-Image derivative models, not to replace Z-Image-Turbo. |
| |
|
| | ## Results |
| | ### The difference between the 2603 version model and the 2602 version model |
| |
|
| | The 2602 model tends to produce blurry images with sigmas below 0.500, as the distillation model was not trained on certain steps. The 2603 model introduces a random timesteps strategy, making it better adapted to sigmas below 0.500. |
| |
|
| | As shown below, when using kl_optimal, many sigmas fall below 0.500. The 2603 model handles these cases correctly, while the 2602 model does not. Note that although kl_optimal is used in the figure, we still recommend using the simple scheduler for inference. |
| |
|
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Z-Image-Fun-Lora-Distill-8-Steps-2602</td> |
| | <td>Z-Image-Fun-Lora-Distill-8-Steps-2603</td> |
| | </tr> |
| | <tr> |
| | <td><img src="results/2602.png" width="100%" /></td> |
| | <td><img src="results/2603.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| |
|
| | ### The difference between the 2602 version model and the previous model |
| |
|
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Z-Image-Fun-Lora-Distill-8-Steps-2602</td> |
| | <td>Z-Image-Fun-Lora-Distill-4-Steps-2602</td> |
| | <td>Z-Image-Fun-Lora-Distill-8-Steps</td> |
| | </tr> |
| | <tr> |
| | <td><img src="results/2602_1_8steps.png" width="100%" /><img src="results/2602_2_8steps.png" width="100%" /><img src="results/2602_3_8steps.png" width="100%" /><img src="results/2602_4_8steps.png" width="100%" /><img src="results/2602_5_8steps.png" width="100%" /></td> |
| | <td><img src="results/2602_1_4steps.png" width="100%" /><img src="results/2602_2_4steps.png" width="100%" /><img src="results/2602_3_4steps.png" width="100%" /><img src="results/2602_4_4steps.png" width="100%" /><img src="results/2602_5_4steps.png" width="100%" /></td> |
| | <td><img src="results/old_1_8steps.png" width="100%" /><img src="results/old_2_8steps.png" width="100%" /><img src="results/old_3_8steps.png" width="100%" /><img src="results/old_4_8steps.png" width="100%" /><img src="results/old_5_8steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | ### Work itself |
| |
|
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Output 25 steps</td> |
| | <td>Output 8-Steps-2602</td> |
| | <td>Output 4-Steps-2602</td> |
| | </tr> |
| | <tr> |
| | <td><img src="results/output4.png" width="100%" /></td> |
| | <td><img src="results/output4_2602_8steps.png" width="100%" /></td> |
| | <td><img src="results/output4_2602_4steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Output 25 steps</td> |
| | <td>Output 8-Steps-2602</td> |
| | <td>Output 4-Steps-2602</td> |
| | </tr> |
| | <tr> |
| | <td><img src="results/output1.png" width="100%" /></td> |
| | <td><img src="results/output1_2602_8steps.png" width="100%" /></td> |
| | <td><img src="results/output1_2602_4steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Output 25 steps</td> |
| | <td>Output 8-Steps-2602</td> |
| | <td>Output 4-Steps-2602</td> |
| | </tr> |
| | <tr> |
| | <td><img src="results/output2.png" width="100%" /></td> |
| | <td><img src="results/output2_2602_8steps.png" width="100%" /></td> |
| | <td><img src="results/output2_2602_4steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Output 25 steps</td> |
| | <td>Output 8-Steps-2602</td> |
| | <td>Output 4-Steps-2602</td> |
| | </tr> |
| | <tr> |
| | <td><img src="results/output3.png" width="100%" /></td> |
| | <td><img src="results/output3_2602_8steps.png" width="100%" /></td> |
| | <td><img src="results/output3_2602_4steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | ### Work with Controlnet |
| |
|
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Pose + Inpaint</td> |
| | <td>Output 25 steps</td> |
| | <td>Output 8-Steps-2602</td> |
| | <td>Output 4-Steps-2602</td> |
| | </tr> |
| | <tr> |
| | <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td> |
| | <td><img src="results/inpaint.png" width="100%" /></td> |
| | <td><img src="results/inpaint_2602_8steps.png" width="100%" /></td> |
| | <td><img src="results/inpaint_2602_4steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Pose + Inpaint</td> |
| | <td>Output 25 steps</td> |
| | <td>Output 8-Steps-2602</td> |
| | <td>Output 4-Steps-2602</td> |
| | </tr> |
| | <tr> |
| | <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /><img src="asset/pose.jpg" width="100%" /></td> |
| | <td><img src="results/pose_inpaint.png" width="100%" /></td> |
| | <td><img src="results/pose_inpaint_2602_8steps.png" width="100%" /></td> |
| | <td><img src="results/pose_inpaint_2602_4steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Pose</td> |
| | <td>Output 25 steps</td> |
| | <td>Output 8-Steps-2602</td> |
| | <td>Output 4-Steps-2602</td> |
| | </tr> |
| | <tr> |
| | <td><img src="asset/pose2.jpg" width="100%" /></td> |
| | <td><img src="results/pose2.png" width="100%" /></td> |
| | <td><img src="results/pose2_2602_8steps.png" width="100%" /></td> |
| | <td><img src="results/pose2_2602_4steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Canny</td> |
| | <td>Output</td> |
| | <td>Output 8-Steps-2602</td> |
| | <td>Output 4-Steps-2602</td> |
| | </tr> |
| | <tr> |
| | <td><img src="asset/canny.jpg" width="100%" /></td> |
| | <td><img src="results/canny.png" width="100%" /></td> |
| | <td><img src="results/canny_2602_8steps.png" width="100%" /></td> |
| | <td><img src="results/canny_2602_4steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | <table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
| | <tr> |
| | <td>Depth</td> |
| | <td>Output</td> |
| | <td>Output 8-Steps-2602</td> |
| | <td>Output 4-Steps-2602</td> |
| | </tr> |
| | <tr> |
| | <td><img src="asset/gray.jpg" width="100%" /></td> |
| | <td><img src="results/gray.png" width="100%" /></td> |
| | <td><img src="results/gray_2602_8steps.png" width="100%" /></td> |
| | <td><img src="results/gray_2602_4steps.png" width="100%" /></td> |
| | </tr> |
| | </table> |
| | |
| | ## Inference |
| | Go to the VideoX-Fun repository for more details. |
| |
|
| | Please clone the VideoX-Fun repository and create the required directories: |
| |
|
| | ```sh |
| | # Clone the code |
| | git clone https://github.com/aigc-apps/VideoX-Fun.git |
| | |
| | # Enter VideoX-Fun's directory |
| | cd VideoX-Fun |
| | |
| | # Create model directories |
| | mkdir -p models/Diffusion_Transformer |
| | mkdir -p models/Personalized_Model |
| | ``` |
| |
|
| | Then download the weights into models/Diffusion_Transformer and models/Personalized_Model. |
| |
|
| | ``` |
| | π¦ models/ |
| | βββ π Diffusion_Transformer/ |
| | β βββ π Z-Image/ |
| | βββ π Personalized_Model/ |
| | β βββ π¦ Z-Image-Fun-Lora-Distill-4-Steps-2602.safetensors |
| | β βββ π¦ Z-Image-Fun-Lora-Distill-8-Steps-2602.safetensors |
| | β βββ π¦ Z-Image-Fun-Controlnet-Union-2.1.safetensors |
| | β βββ π¦ Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors |
| | ``` |
| |
|
| | To run the model, **first** set the lora_path in `examples/z_image/predict_t2i.py` to: |
| | `Personalized_Model/Z-Image-Fun-Lora-Distill-8-Steps.safetensors` |
| |
|
| | **Then**, run the file: |
| | `examples/z_image/predict_t2i.py` |
| |
|
| | The following scripts are also supported: |
| | - examples/z_image_fun/predict_t2i_control_2.1.py |
| | - examples/z_image_fun/predict_i2i_inpaint_2.1.py |
| |
|
| | **Recommended Settings**: |
| | - cfg = 1.0 |
| | - steps = 8 |
| | - lora_weight = 0.8 (suggested range: 0.7 ~ 0.9) |
| | |