File size: 5,615 Bytes
bfd61e2
 
 
 
 
 
 
 
 
 
 
 
 
 
755999a
 
 
bfd61e2
 
 
7f4abdc
 
bfd61e2
 
 
 
 
97087d1
bfd61e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97087d1
bfd61e2
 
 
 
 
 
 
 
755999a
 
 
 
 
 
 
 
 
 
 
bfd61e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d47c43f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
license: apache-2.0
library_name: videox_fun
---

# Z-Image-Fun-Controlnet-Union-2.1

[![Github](https://img.shields.io/badge/🎬%20Code-VideoX_Fun-blue)](https://github.com/aigc-apps/VideoX-Fun)

## Model Card

| Name | Description |
|--|--|
| Z-Image-Fun-Controlnet-Union-2.1.safetensors | ControlNet weights for Z-Image. The model supports multiple control conditions such as Canny, Depth, Pose, MLSD, Scribble, Hed and Gray. This ControlNet is added on 15 layer blocks and 2 refiner layer blocks. |
| Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors | Compared to the large version of the model, fewer layers have control added, resulting in weaker control conditions. This makes it suitable for larger control_context_scale values, and the generation results appear more natural. It is also suitable for lower-spec machines. |
| Z-Image-Fun-Controlnet-Tile-2.1.safetensors | A Tile model trained on high-definition datasets (up to 2048Γ—2048) for super-resolution. |
| Z-Image-Fun-Controlnet-Tile-2.1-lite.safetensors | Applied control latents to fewer layers, resulting in weaker control. This allows for larger control_context_scale values with more natural results, and is also better suited for lower-spec machines. |

## Model Features
- This ControlNet is added on 15 layer blocks and 2 refiner layer blocks (Lite models are added on 3 layer blocks and 2 refiner blocks). It supports multiple control conditionsβ€”including Canny, Depth, Pose, MLSD, Scribble, Hed and Gray can be used like a standard ControlNet. 
- Inpainting mode is also supported. When using inpaint mode, please use a larger control_context_scale, as this will result in better image continuity.
- You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 1.00. 

## Results

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Inpaint</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
    <td><img src="results/inpaint.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose + Inpaint</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /><img src="asset/pose.jpg" width="100%" /></td>
    <td><img src="results/pose_inpaint.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/pose2.jpg" width="100%" /></td>
    <td><img src="results/pose2.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/pose.jpg" width="100%" /></td>
    <td><img src="results/pose.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/pose3.jpg" width="100%" /></td>
    <td><img src="results/pose3.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Canny</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/canny.jpg" width="100%" /></td>
    <td><img src="results/canny.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>HED</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/hed.jpg" width="100%" /></td>
    <td><img src="results/hed.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Depth</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/depth.jpg" width="100%" /></td>
    <td><img src="results/depth.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Gray</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/gray.jpg" width="100%" /></td>
    <td><img src="results/gray.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Low Resolution</td>
    <td>High Resolution</td>
  </tr>
  <tr>
    <td><img src="asset/low_res.jpg" width="100%" /></td>
    <td><img src="results/high_res.png" width="100%" /></td>
  </tr>
</table>

## Inference
Go to the VideoX-Fun repository for more details.

Please clone the VideoX-Fun repository and create the required directories:

```sh
# Clone the code
git clone https://github.com/aigc-apps/VideoX-Fun.git

# Enter VideoX-Fun's directory
cd VideoX-Fun

# Create model directories
mkdir -p models/Diffusion_Transformer
mkdir -p models/Personalized_Model
```

Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.

```
πŸ“¦ models/
β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
β”‚   └── πŸ“‚ Z-Image/
β”œβ”€β”€ πŸ“‚ Personalized_Model/
β”‚   β”œβ”€β”€ πŸ“¦ Z-Image-Fun-Controlnet-Union-2.1.safetensors
β”‚   └── πŸ“¦ Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors
```

Then run the file `examples/z_image_fun/predict_t2i_control_2.1.py` and `examples/z_image_fun/predict_i2i_inpaint_2.1.py`.