File size: 5,615 Bytes

---
license: apache-2.0
library_name: videox_fun
---

# Z-Image-Fun-Controlnet-Union-2.1

[![Github](https://img.shields.io/badge/🎬%20Code-VideoX_Fun-blue)](https://github.com/aigc-apps/VideoX-Fun)

## Model Card

| Name | Description |
|--|--|
| Z-Image-Fun-Controlnet-Union-2.1.safetensors | ControlNet weights for Z-Image. The model supports multiple control conditions such as Canny, Depth, Pose, MLSD, Scribble, Hed and Gray. This ControlNet is added on 15 layer blocks and 2 refiner layer blocks. |
| Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors | Compared to the large version of the model, fewer layers have control added, resulting in weaker control conditions. This makes it suitable for larger control_context_scale values, and the generation results appear more natural. It is also suitable for lower-spec machines. |
| Z-Image-Fun-Controlnet-Tile-2.1.safetensors | A Tile model trained on high-definition datasets (up to 2048×2048) for super-resolution. |
| Z-Image-Fun-Controlnet-Tile-2.1-lite.safetensors | Applied control latents to fewer layers, resulting in weaker control. This allows for larger control_context_scale values with more natural results, and is also better suited for lower-spec machines. |

## Model Features
- This ControlNet is added on 15 layer blocks and 2 refiner layer blocks (Lite models are added on 3 layer blocks and 2 refiner blocks). It supports multiple control conditions—including Canny, Depth, Pose, MLSD, Scribble, Hed and Gray can be used like a standard ControlNet. 
- Inpainting mode is also supported. When using inpaint mode, please use a larger control_context_scale, as this will result in better image continuity.
- You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 1.00. 

## Results

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Inpaint</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
    <td><img src="results/inpaint.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose + Inpaint</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /><img src="asset/pose.jpg" width="100%" /></td>
    <td><img src="results/pose_inpaint.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/pose2.jpg" width="100%" /></td>
    <td><img src="results/pose2.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/pose.jpg" width="100%" /></td>
    <td><img src="results/pose.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/pose3.jpg" width="100%" /></td>
    <td><img src="results/pose3.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Canny</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/canny.jpg" width="100%" /></td>
    <td><img src="results/canny.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>HED</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/hed.jpg" width="100%" /></td>
    <td><img src="results/hed.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Depth</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/depth.jpg" width="100%" /></td>
    <td><img src="results/depth.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Gray</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="asset/gray.jpg" width="100%" /></td>
    <td><img src="results/gray.png" width="100%" /></td>
  </tr>
</table>

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Low Resolution</td>
    <td>High Resolution</td>
  </tr>
  <tr>
    <td><img src="asset/low_res.jpg" width="100%" /></td>
    <td><img src="results/high_res.png" width="100%" /></td>
  </tr>
</table>

## Inference
Go to the VideoX-Fun repository for more details.

Please clone the VideoX-Fun repository and create the required directories:

```sh
# Clone the code
git clone https://github.com/aigc-apps/VideoX-Fun.git

# Enter VideoX-Fun's directory
cd VideoX-Fun

# Create model directories
mkdir -p models/Diffusion_Transformer
mkdir -p models/Personalized_Model
```

Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.

```
📦 models/
├── 📂 Diffusion_Transformer/
│   └── 📂 Z-Image/
├── 📂 Personalized_Model/
│   ├── 📦 Z-Image-Fun-Controlnet-Union-2.1.safetensors
│   └── 📦 Z-Image-Fun-Controlnet-Union-2.1-lite.safetensors
```

Then run the file `examples/z_image_fun/predict_t2i_control_2.1.py` and `examples/z_image_fun/predict_i2i_inpaint_2.1.py`.