Update README.md
#1
by
kelseye - opened
README.md
CHANGED
|
@@ -1,184 +1,198 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
library_name: videox_fun
|
| 4 |
-
---
|
| 5 |
-
|
| 6 |
-
# Z-Image-Fun-Lora-Distill
|
| 7 |
-
|
| 8 |
-
[](https://github.com/aigc-apps/VideoX-Fun)
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
## Model Card
|
| 12 |
-
|
| 13 |
-
| Name | Description |
|
| 14 |
-
|--|--|
|
| 15 |
-
| Z-Image-Fun-Lora-Distill-8-Steps.safetensors | This is a Distill LoRA for Z-Image that distills both steps and CFG. This model does not require CFG and uses 8 steps for inference. |
|
| 16 |
-
|
| 17 |
-
## Model Features
|
| 18 |
-
- This is a Distill LoRA for Z-Image that distills both steps and CFG. It does not use any Z-Image-Turbo related weights and is trained from scratch. It is compatible with other Z-Image LoRAs and [Controls](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1).
|
| 19 |
-
- This model will slightly reduce the output quality and change the output composition of the model. For specific comparisons, please refer to the Results section. In most cases, the Distill LoRA performs well; currently, the biggest issue is that it may make the generated results brighter.
|
| 20 |
-
- The purpose of this model is to provide fast generation compatibility for Z-Image derivative models, not to replace Z-Image-Turbo.
|
| 21 |
-
|
| 22 |
-
## TODO
|
| 23 |
-
- Optimize the output brightness;
|
| 24 |
-
- Train a 4-step LoRA.
|
| 25 |
-
|
| 26 |
-
## Results
|
| 27 |
-
### Work itself
|
| 28 |
-
|
| 29 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 30 |
-
<tr>
|
| 31 |
-
<td>Output 25 steps</td>
|
| 32 |
-
<td>Output 8 steps</td>
|
| 33 |
-
</tr>
|
| 34 |
-
<tr>
|
| 35 |
-
<td><img src="results/output1.png" width="100%" /></td>
|
| 36 |
-
<td><img src="results/output1_8steps.png" width="100%" /></td>
|
| 37 |
-
</tr>
|
| 38 |
-
</table>
|
| 39 |
-
|
| 40 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 41 |
-
<tr>
|
| 42 |
-
<td>Output 25 steps</td>
|
| 43 |
-
<td>Output 8 steps</td>
|
| 44 |
-
</tr>
|
| 45 |
-
<tr>
|
| 46 |
-
<td><img src="results/output2.png" width="100%" /></td>
|
| 47 |
-
<td><img src="results/output2_8steps.png" width="100%" /></td>
|
| 48 |
-
</tr>
|
| 49 |
-
</table>
|
| 50 |
-
|
| 51 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 52 |
-
<tr>
|
| 53 |
-
<td>Output 25 steps</td>
|
| 54 |
-
<td>Output 8 steps</td>
|
| 55 |
-
</tr>
|
| 56 |
-
<tr>
|
| 57 |
-
<td><img src="results/output3.png" width="100%" /></td>
|
| 58 |
-
<td><img src="results/output3_8steps.png" width="100%" /></td>
|
| 59 |
-
</tr>
|
| 60 |
-
</table>
|
| 61 |
-
|
| 62 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 63 |
-
<tr>
|
| 64 |
-
<td>Output 25 steps</td>
|
| 65 |
-
<td>Output 8 steps</td>
|
| 66 |
-
</tr>
|
| 67 |
-
<tr>
|
| 68 |
-
<td><img src="results/output4.png" width="100%" /></td>
|
| 69 |
-
<td><img src="results/output4_8steps.png" width="100%" /></td>
|
| 70 |
-
</tr>
|
| 71 |
-
</table>
|
| 72 |
-
|
| 73 |
-
### Work with Controlnet
|
| 74 |
-
|
| 75 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 76 |
-
<tr>
|
| 77 |
-
<td>Pose + Inpaint</td>
|
| 78 |
-
<td>Output 25 steps</td>
|
| 79 |
-
<td>Output 8 steps</td>
|
| 80 |
-
</tr>
|
| 81 |
-
<tr>
|
| 82 |
-
<td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
|
| 83 |
-
<td><img src="results/inpaint.png" width="100%" /></td>
|
| 84 |
-
<td><img src="results/inpaint_8steps.png" width="100%" /></td>
|
| 85 |
-
</tr>
|
| 86 |
-
</table>
|
| 87 |
-
|
| 88 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 89 |
-
<tr>
|
| 90 |
-
<td>Pose + Inpaint</td>
|
| 91 |
-
<td>Output 25 steps</td>
|
| 92 |
-
<td>Output 8 steps</td>
|
| 93 |
-
</tr>
|
| 94 |
-
<tr>
|
| 95 |
-
<td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /><img src="asset/pose.jpg" width="100%" /></td>
|
| 96 |
-
<td><img src="results/pose_inpaint.png" width="100%" /></td>
|
| 97 |
-
<td><img src="results/pose_inpaint_8steps.png" width="100%" /></td>
|
| 98 |
-
</tr>
|
| 99 |
-
</table>
|
| 100 |
-
|
| 101 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 102 |
-
<tr>
|
| 103 |
-
<td>Pose</td>
|
| 104 |
-
<td>Output 25 steps</td>
|
| 105 |
-
<td>Output 8 steps</td>
|
| 106 |
-
</tr>
|
| 107 |
-
<tr>
|
| 108 |
-
<td><img src="asset/pose2.jpg" width="100%" /></td>
|
| 109 |
-
<td><img src="results/pose2.png" width="100%" /></td>
|
| 110 |
-
<td><img src="results/pose2_8steps.png" width="100%" /></td>
|
| 111 |
-
</tr>
|
| 112 |
-
</table>
|
| 113 |
-
|
| 114 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 115 |
-
<tr>
|
| 116 |
-
<td>Pose</td>
|
| 117 |
-
<td>Output 25 steps</td>
|
| 118 |
-
<td>Output 8 steps</td>
|
| 119 |
-
</tr>
|
| 120 |
-
<tr>
|
| 121 |
-
<td><img src="asset/pose4.jpg" width="100%" /></td>
|
| 122 |
-
<td><img src="results/pose4.png" width="100%" /></td>
|
| 123 |
-
<td><img src="results/pose4_8steps.png" width="100%" /></td>
|
| 124 |
-
</tr>
|
| 125 |
-
</table>
|
| 126 |
-
|
| 127 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 128 |
-
<tr>
|
| 129 |
-
<td>Canny</td>
|
| 130 |
-
<td>Output</td>
|
| 131 |
-
<td>Output 8 steps</td>
|
| 132 |
-
</tr>
|
| 133 |
-
<tr>
|
| 134 |
-
<td><img src="asset/canny.jpg" width="100%" /></td>
|
| 135 |
-
<td><img src="results/canny.png" width="100%" /></td>
|
| 136 |
-
<td><img src="results/canny_8steps.png" width="100%" /></td>
|
| 137 |
-
</tr>
|
| 138 |
-
</table>
|
| 139 |
-
|
| 140 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 141 |
-
<tr>
|
| 142 |
-
<td>Depth</td>
|
| 143 |
-
<td>Output</td>
|
| 144 |
-
<td>Output 8 steps</td>
|
| 145 |
-
</tr>
|
| 146 |
-
<tr>
|
| 147 |
-
<td><img src="asset/gray.jpg" width="100%" /></td>
|
| 148 |
-
<td><img src="results/gray.png" width="100%" /></td>
|
| 149 |
-
<td><img src="results/gray_8steps.png" width="100%" /></td>
|
| 150 |
-
</tr>
|
| 151 |
-
</table>
|
| 152 |
-
|
| 153 |
-
## Inference
|
| 154 |
-
Go to the VideoX-Fun repository for more details.
|
| 155 |
-
|
| 156 |
-
Please clone the VideoX-Fun repository and create the required directories:
|
| 157 |
-
|
| 158 |
-
```sh
|
| 159 |
-
# Clone the code
|
| 160 |
-
git clone https://github.com/aigc-apps/VideoX-Fun.git
|
| 161 |
-
|
| 162 |
-
# Enter VideoX-Fun's directory
|
| 163 |
-
cd VideoX-Fun
|
| 164 |
-
|
| 165 |
-
# Create model directories
|
| 166 |
-
mkdir -p models/Diffusion_Transformer
|
| 167 |
-
mkdir -p models/Personalized_Model
|
| 168 |
-
```
|
| 169 |
-
|
| 170 |
-
Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
|
| 171 |
-
|
| 172 |
-
```
|
| 173 |
-
π¦ models/
|
| 174 |
-
βββ π Diffusion_Transformer/
|
| 175 |
-
β βββ π Z-Image/
|
| 176 |
-
βββ π Personalized_Model/
|
| 177 |
-
β βββ π¦ Z-Image-Fun-Lora-Distill-8-Steps.safetensors
|
| 178 |
-
β βββ π¦ Z-Image-Fun-Controlnet-Union-2.1.safetensors
|
| 179 |
-
β βββ π¦ Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors
|
| 180 |
-
```
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: videox_fun
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Z-Image-Fun-Lora-Distill
|
| 7 |
+
|
| 8 |
+
[](https://github.com/aigc-apps/VideoX-Fun)
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
## Model Card
|
| 12 |
+
|
| 13 |
+
| Name | Description |
|
| 14 |
+
|--|--|
|
| 15 |
+
| Z-Image-Fun-Lora-Distill-8-Steps.safetensors | This is a Distill LoRA for Z-Image that distills both steps and CFG. This model does not require CFG and uses 8 steps for inference. |
|
| 16 |
+
|
| 17 |
+
## Model Features
|
| 18 |
+
- This is a Distill LoRA for Z-Image that distills both steps and CFG. It does not use any Z-Image-Turbo related weights and is trained from scratch. It is compatible with other Z-Image LoRAs and [Controls](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1).
|
| 19 |
+
- This model will slightly reduce the output quality and change the output composition of the model. For specific comparisons, please refer to the Results section. In most cases, the Distill LoRA performs well; currently, the biggest issue is that it may make the generated results brighter.
|
| 20 |
+
- The purpose of this model is to provide fast generation compatibility for Z-Image derivative models, not to replace Z-Image-Turbo.
|
| 21 |
+
|
| 22 |
+
## TODO
|
| 23 |
+
- Optimize the output brightness;
|
| 24 |
+
- Train a 4-step LoRA.
|
| 25 |
+
|
| 26 |
+
## Results
|
| 27 |
+
### Work itself
|
| 28 |
+
|
| 29 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 30 |
+
<tr>
|
| 31 |
+
<td>Output 25 steps</td>
|
| 32 |
+
<td>Output 8 steps</td>
|
| 33 |
+
</tr>
|
| 34 |
+
<tr>
|
| 35 |
+
<td><img src="results/output1.png" width="100%" /></td>
|
| 36 |
+
<td><img src="results/output1_8steps.png" width="100%" /></td>
|
| 37 |
+
</tr>
|
| 38 |
+
</table>
|
| 39 |
+
|
| 40 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 41 |
+
<tr>
|
| 42 |
+
<td>Output 25 steps</td>
|
| 43 |
+
<td>Output 8 steps</td>
|
| 44 |
+
</tr>
|
| 45 |
+
<tr>
|
| 46 |
+
<td><img src="results/output2.png" width="100%" /></td>
|
| 47 |
+
<td><img src="results/output2_8steps.png" width="100%" /></td>
|
| 48 |
+
</tr>
|
| 49 |
+
</table>
|
| 50 |
+
|
| 51 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 52 |
+
<tr>
|
| 53 |
+
<td>Output 25 steps</td>
|
| 54 |
+
<td>Output 8 steps</td>
|
| 55 |
+
</tr>
|
| 56 |
+
<tr>
|
| 57 |
+
<td><img src="results/output3.png" width="100%" /></td>
|
| 58 |
+
<td><img src="results/output3_8steps.png" width="100%" /></td>
|
| 59 |
+
</tr>
|
| 60 |
+
</table>
|
| 61 |
+
|
| 62 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 63 |
+
<tr>
|
| 64 |
+
<td>Output 25 steps</td>
|
| 65 |
+
<td>Output 8 steps</td>
|
| 66 |
+
</tr>
|
| 67 |
+
<tr>
|
| 68 |
+
<td><img src="results/output4.png" width="100%" /></td>
|
| 69 |
+
<td><img src="results/output4_8steps.png" width="100%" /></td>
|
| 70 |
+
</tr>
|
| 71 |
+
</table>
|
| 72 |
+
|
| 73 |
+
### Work with Controlnet
|
| 74 |
+
|
| 75 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 76 |
+
<tr>
|
| 77 |
+
<td>Pose + Inpaint</td>
|
| 78 |
+
<td>Output 25 steps</td>
|
| 79 |
+
<td>Output 8 steps</td>
|
| 80 |
+
</tr>
|
| 81 |
+
<tr>
|
| 82 |
+
<td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
|
| 83 |
+
<td><img src="results/inpaint.png" width="100%" /></td>
|
| 84 |
+
<td><img src="results/inpaint_8steps.png" width="100%" /></td>
|
| 85 |
+
</tr>
|
| 86 |
+
</table>
|
| 87 |
+
|
| 88 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 89 |
+
<tr>
|
| 90 |
+
<td>Pose + Inpaint</td>
|
| 91 |
+
<td>Output 25 steps</td>
|
| 92 |
+
<td>Output 8 steps</td>
|
| 93 |
+
</tr>
|
| 94 |
+
<tr>
|
| 95 |
+
<td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /><img src="asset/pose.jpg" width="100%" /></td>
|
| 96 |
+
<td><img src="results/pose_inpaint.png" width="100%" /></td>
|
| 97 |
+
<td><img src="results/pose_inpaint_8steps.png" width="100%" /></td>
|
| 98 |
+
</tr>
|
| 99 |
+
</table>
|
| 100 |
+
|
| 101 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 102 |
+
<tr>
|
| 103 |
+
<td>Pose</td>
|
| 104 |
+
<td>Output 25 steps</td>
|
| 105 |
+
<td>Output 8 steps</td>
|
| 106 |
+
</tr>
|
| 107 |
+
<tr>
|
| 108 |
+
<td><img src="asset/pose2.jpg" width="100%" /></td>
|
| 109 |
+
<td><img src="results/pose2.png" width="100%" /></td>
|
| 110 |
+
<td><img src="results/pose2_8steps.png" width="100%" /></td>
|
| 111 |
+
</tr>
|
| 112 |
+
</table>
|
| 113 |
+
|
| 114 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 115 |
+
<tr>
|
| 116 |
+
<td>Pose</td>
|
| 117 |
+
<td>Output 25 steps</td>
|
| 118 |
+
<td>Output 8 steps</td>
|
| 119 |
+
</tr>
|
| 120 |
+
<tr>
|
| 121 |
+
<td><img src="asset/pose4.jpg" width="100%" /></td>
|
| 122 |
+
<td><img src="results/pose4.png" width="100%" /></td>
|
| 123 |
+
<td><img src="results/pose4_8steps.png" width="100%" /></td>
|
| 124 |
+
</tr>
|
| 125 |
+
</table>
|
| 126 |
+
|
| 127 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 128 |
+
<tr>
|
| 129 |
+
<td>Canny</td>
|
| 130 |
+
<td>Output</td>
|
| 131 |
+
<td>Output 8 steps</td>
|
| 132 |
+
</tr>
|
| 133 |
+
<tr>
|
| 134 |
+
<td><img src="asset/canny.jpg" width="100%" /></td>
|
| 135 |
+
<td><img src="results/canny.png" width="100%" /></td>
|
| 136 |
+
<td><img src="results/canny_8steps.png" width="100%" /></td>
|
| 137 |
+
</tr>
|
| 138 |
+
</table>
|
| 139 |
+
|
| 140 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 141 |
+
<tr>
|
| 142 |
+
<td>Depth</td>
|
| 143 |
+
<td>Output</td>
|
| 144 |
+
<td>Output 8 steps</td>
|
| 145 |
+
</tr>
|
| 146 |
+
<tr>
|
| 147 |
+
<td><img src="asset/gray.jpg" width="100%" /></td>
|
| 148 |
+
<td><img src="results/gray.png" width="100%" /></td>
|
| 149 |
+
<td><img src="results/gray_8steps.png" width="100%" /></td>
|
| 150 |
+
</tr>
|
| 151 |
+
</table>
|
| 152 |
+
|
| 153 |
+
## Inference
|
| 154 |
+
Go to the VideoX-Fun repository for more details.
|
| 155 |
+
|
| 156 |
+
Please clone the VideoX-Fun repository and create the required directories:
|
| 157 |
+
|
| 158 |
+
```sh
|
| 159 |
+
# Clone the code
|
| 160 |
+
git clone https://github.com/aigc-apps/VideoX-Fun.git
|
| 161 |
+
|
| 162 |
+
# Enter VideoX-Fun's directory
|
| 163 |
+
cd VideoX-Fun
|
| 164 |
+
|
| 165 |
+
# Create model directories
|
| 166 |
+
mkdir -p models/Diffusion_Transformer
|
| 167 |
+
mkdir -p models/Personalized_Model
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
+
Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
|
| 171 |
+
|
| 172 |
+
```
|
| 173 |
+
π¦ models/
|
| 174 |
+
βββ π Diffusion_Transformer/
|
| 175 |
+
β βββ π Z-Image/
|
| 176 |
+
βββ π Personalized_Model/
|
| 177 |
+
β βββ π¦ Z-Image-Fun-Lora-Distill-8-Steps.safetensors
|
| 178 |
+
β βββ π¦ Z-Image-Fun-Controlnet-Union-2.1.safetensors
|
| 179 |
+
β βββ π¦ Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors
|
| 180 |
+
```
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
To run the model, **first** set the lora_path in `examples/z_image/predict_t2i.py` to:
|
| 184 |
+
`Personalized_Model/Z-Image-Fun-Lora-Distill-8-Steps.safetensors`
|
| 185 |
+
|
| 186 |
+
|
| 187 |
+
**Then**, run the file:
|
| 188 |
+
`examples/z_image/predict_t2i.py`
|
| 189 |
+
|
| 190 |
+
The following scripts are also supported:
|
| 191 |
+
- examples/z_image_fun/predict_t2i_control_2.1.py
|
| 192 |
+
- examples/z_image_fun/predict_i2i_inpaint_2.1.py
|
| 193 |
+
|
| 194 |
+
|
| 195 |
+
**Recommended Settings**:
|
| 196 |
+
- cfg = 1.0
|
| 197 |
+
- steps = 8
|
| 198 |
+
- lora_weight = 0.8 (suggested range: 0.7 ~ 0.8)
|