bubbliiiing commited on
Commit
1fc23a9
Β·
1 Parent(s): ab8e966

Update Weights

Browse files
README.md CHANGED
@@ -3,28 +3,54 @@ license: apache-2.0
3
  library_name: videox_fun
4
  ---
5
 
6
- # Z-Image-Turbo-Fun-Controlnet-Union-2.0
7
 
8
  [![Github](https://img.shields.io/badge/🎬%20Code-VideoX_Fun-blue)](https://github.com/aigc-apps/VideoX-Fun)
9
 
10
-
11
  ## Update
12
- - Due to a typo in the code, `control_layers` was used instead of `control_noise_refiner` to process refiner latents during training. Although the model converged normally, the model inference speed was slow because `control_layers` forward pass was performed twice. In version 2.1, we made an urgent fix and the speed has returned to normal. [2025.12.17]
 
 
 
 
 
 
 
 
 
13
 
14
  ## Model Features
15
- - This ControlNet is added on 15 layer blocks and 2 refiner layer blocks.
16
- - The model was trained from scratch for 70,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
17
- - It supports multiple control conditionsβ€”including Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
18
- - We found that under different strength levels, using different step numbers has a certain impact on the realism and clarity of the results. For strength and step testing, please refer to [Scale Test Results](#scale-test-results).
19
- - You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.90.
20
- - **Note on Steps: As you increase the control strength (higher control_context_scale values), it's recommended to appropriately increase the number of inference steps to achieve better results and maintain generation quality. This is likely because the control model has not been distilled.**
21
  - Inpainting mode is also supported.
 
 
 
 
 
 
 
 
 
22
 
23
  ## TODO
24
  - [ ] Train on better data.
25
 
26
  ## Results
 
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
29
  <tr>
30
  <td>Pose + Inpaint</td>
@@ -102,6 +128,17 @@ library_name: videox_fun
102
  </tr>
103
  </table>
104
 
 
 
 
 
 
 
 
 
 
 
 
105
  ## Inference
106
  Go to the VideoX-Fun repository for more details.
107
 
@@ -126,10 +163,15 @@ Then download the weights into models/Diffusion_Transformer and models/Personali
126
  β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
127
  β”‚ └── πŸ“‚ Z-Image-Turbo/
128
  β”œβ”€β”€ πŸ“‚ Personalized_Model/
129
- β”‚ └── πŸ“¦ Z-Image-Turbo-Fun-Controlnet-Union-2.0.safetensors
 
 
130
  ```
131
 
132
- Then run the file `examples/z_image_fun/predict_t2i_control_2.0.py` and `examples/z_image_fun/predict_i2i_inpaint_2.0.py`.
 
 
 
133
 
134
  ## Scale Test Results
135
 
@@ -147,3 +189,4 @@ Parameter Description:
147
 
148
  Diffusion Steps: Number of iteration steps for the diffusion model (9, 10, 20, 30, 40)
149
  Control Scale: Control strength coefficient (0.65 - 1.0)
 
 
3
  library_name: videox_fun
4
  ---
5
 
6
+ # Z-Image-Turbo-Fun-Controlnet-Union-2.1
7
 
8
  [![Github](https://img.shields.io/badge/🎬%20Code-VideoX_Fun-blue)](https://github.com/aigc-apps/VideoX-Fun)
9
 
 
10
  ## Update
11
+ - During testing, we found that applying ControlNet to Z-Image-Turbo caused the model to lose its acceleration capability and become blurry. We performed 8-step distillation on the version 2.1 model, and the distilled model demonstrates better performance when using 8-step prediction. Additionally, we have uploaded a tile model that can be used for super-resolution generation. [2025.12.22]
12
+ - Due to a typo in version 2.0, `control_layers` was used instead of `control_noise_refiner` to process refiner latents during training. Although the model converged normally, the model inference speed was slow because `control_layers` forward pass was performed twice. In version 2.1, we made an urgent fix and the speed has returned to normal. [2025.12.17]
13
+
14
+ ## Model Card
15
+ | Name | Description |
16
+ |--|--|
17
+ | Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors | Based on version 2.1, the model was distilled using an 8-step distillation algorithm. 8-step prediction is recommended. Compared to version 2.1, when using 8-step prediction, the images are clearer and the composition is more reasonable. |
18
+ | Z-Image-Turbo-Fun-Controlnet-Tile-2.1-8steps.safetensors | A Tile model trained on high-definition datasets that can be used for super-resolution, with a maximum training resolution of 2048x2048. The model was distilled using an 8-step distillation algorithm, and 8-step prediction is recommended. |
19
+ | Z-Image-Turbo-Fun-Controlnet-Union-2.1.safetensors | A retrained model after fixing the typo in version 2.0, with faster single-step speed. Similar to version 2.0, the model lost some of its acceleration capability after training, thus requiring more steps. |
20
+ | Z-Image-Turbo-Fun-Controlnet-Union-2.0.safetensors | ControlNet weights for Z-Image-Turbo. Compared to version 1.0, it adds modifications to more layers and was trained for a longer time. However, due to a typo in the code, the layer blocks were forwarded twice, resulting in slower speed. The model supports multiple control conditions such as Canny, Depth, Pose, MLSD, etc. Additionally, the model lost some of its acceleration capability after training, thus requiring more steps. |
21
 
22
  ## Model Features
23
+ - This ControlNet is added on 15 layer blocks and 2 refiner layer blocks. It supports multiple control conditionsβ€”including Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
 
 
 
 
 
24
  - Inpainting mode is also supported.
25
+ - Training Process:
26
+ - 2.0: The model was trained from scratch for 70,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
27
+ - 2.1: Version 2.1 is based on the version 2.0 weights and continued training for an additional 11,000 steps after the typo fix, using the same parameters and dataset.
28
+ - 2.1-8-steps: Version 2.1-8-steps was obtained by training for 5,500 steps using an 8-step distillation algorithm based on version 2.1.
29
+ - Note on Steps:
30
+ - 2.0 and 2.1: As you increase the control strength (higher control_context_scale values), it's recommended to appropriately increase the number of inference steps to achieve better results and maintain generation quality. This is likely because the control model has not been distilled.
31
+ - 2.1-8-steps: Just use 8 steps in inference.
32
+ - You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.90.
33
+ - During testing, in versions 2.0 and 2.1, we found that applying ControlNet to Z-Image-Turbo caused the model to lose its acceleration capability and produce blurry images. For detailed information on strength and step count testing, please refer to Scale Test Results. These results were generated using version 2.0. For strength and step testing, please refer to [Scale Test Results](#scale-test-results). This was obtained by generating with version 2.0.
34
 
35
  ## TODO
36
  - [ ] Train on better data.
37
 
38
  ## Results
39
+ ### Difference between 2.1 and 2.1-8steps.
40
 
41
+ 8 steps results:
42
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
43
+ <tr>
44
+ <td>Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps</td>
45
+ <td>Z-Image-Turbo-Fun-Controlnet-Union-2.1</td>
46
+ </tr>
47
+ <tr>
48
+ <td><img src="results/8steps.png" width="100%" /></td>
49
+ <td><img src="results/nsteps.png" width="100%" /></td>
50
+ </tr>
51
+ </table>
52
+
53
+ ### Generation Results
54
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
55
  <tr>
56
  <td>Pose + Inpaint</td>
 
128
  </tr>
129
  </table>
130
 
131
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
132
+ <tr>
133
+ <td>Low Resolution</td>
134
+ <td>High Resolution</td>
135
+ </tr>
136
+ <tr>
137
+ <td><img src="asset/low_res.jpg" width="100%" /></td>
138
+ <td><img src="results/high_res.png" width="100%" /></td>
139
+ </tr>
140
+ </table>
141
+
142
  ## Inference
143
  Go to the VideoX-Fun repository for more details.
144
 
 
163
  β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
164
  β”‚ └── πŸ“‚ Z-Image-Turbo/
165
  β”œβ”€β”€ πŸ“‚ Personalized_Model/
166
+ β”‚ β”œβ”€β”€ πŸ“¦ Z-Image-Turbo-Fun-Controlnet-Union-2.1.safetensors
167
+ β”‚ β”œβ”€β”€ πŸ“¦ Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors
168
+ β”‚ └── πŸ“¦ Z-Image-Turbo-Fun-Controlnet-Tile-2.1-8steps.safetensors
169
  ```
170
 
171
+ Then run the file `examples/z_image_fun/predict_t2i_control_2.1.py` and `examples/z_image_fun/predict_i2i_inpaint_2.1.py`.
172
+
173
+ <details>
174
+ <summary>(Obsolete) Scale Test Results:</summary>
175
 
176
  ## Scale Test Results
177
 
 
189
 
190
  Diffusion Steps: Number of iteration steps for the diffusion model (9, 10, 20, 30, 40)
191
  Control Scale: Control strength coefficient (0.65 - 1.0)
192
+ </details>
Z-Image-Turbo-Fun-Controlnet-Tile-2.1-8steps.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d34935adc2a71b2d74ab51ea6d48a5e494a5a2236498486ca005ace7da4c8054
3
+ size 6712485600
Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41a0947636dc69fff97926df50eb87b406fa8835348a37df65354e9f3a0fbe18
3
+ size 6712485600
asset/low_res.jpg ADDED

Git LFS Details

  • SHA256: 0203f9d2bf8f65fc05a9be290409e37eaa00073e4a73355939f144bdf9439ec9
  • Pointer size: 130 Bytes
  • Size of remote file: 44.4 kB
results/8steps.png ADDED

Git LFS Details

  • SHA256: b2c076eece5deac254ed7973b49904645174827a01ea90a714363a8d4ee79173
  • Pointer size: 132 Bytes
  • Size of remote file: 2.32 MB
results/high_res.png ADDED

Git LFS Details

  • SHA256: b77998bb0d60dc5cc758decbb51638242b12e64c4a9ec14133c28de98599c7ff
  • Pointer size: 132 Bytes
  • Size of remote file: 4.78 MB
results/nsteps.png ADDED

Git LFS Details

  • SHA256: 9e9161f2ba09a46ffdd380d9e3c81b695aba4da3e2d41a2500cdd7eb12627842
  • Pointer size: 132 Bytes
  • Size of remote file: 2.17 MB