Diffusers
Safetensors
File size: 9,045 Bytes
48f7724
 
 
81909e9
 
409714d
 
f10e48c
409714d
81909e9
15ecbb0
409714d
 
 
 
f10e48c
81909e9
409714d
 
f10e48c
409714d
 
 
 
 
 
 
 
f10e48c
409714d
 
 
 
 
 
 
 
 
 
f10e48c
409714d
f10e48c
 
 
 
 
409714d
f10e48c
409714d
f10e48c
409714d
 
 
 
 
 
 
f10e48c
409714d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f10e48c
409714d
 
 
 
 
 
f10e48c
409714d
f10e48c
409714d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f10e48c
409714d
 
 
 
 
 
 
 
 
 
 
 
 
f10e48c
409714d
f10e48c
409714d
 
 
 
 
 
 
 
 
 
 
f10e48c
409714d
f10e48c
409714d
 
 
 
f10e48c
 
409714d
f10e48c
409714d
 
f10e48c
409714d
 
 
 
 
 
 
 
 
 
 
f10e48c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
license: cc-by-nc-sa-4.0
---
# MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on


<a href="https://arxiv.org/abs/2505.21325v2"><img src='https://img.shields.io/badge/arXiv-2501.11325-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>&nbsp;
<a href="https://huggingface.co/LuckyLiGY/MagicTryOn"><img src='https://img.shields.io/badge/Hugging Face-ckpts-orange?style=flat&logo=HuggingFace&logoColor=orange' alt='huggingface'></a>&nbsp;
<a href="https://vivocameraresearch.github.io/magictryon/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='GitHub'></a>&nbsp;
<a href="https://github.com/vivoCameraResearch/Magic-TryOn/"><img src='https://img.shields.io/badge/GitHub-Repo-blue?style=flat&logo=GitHub' alt='GitHub'></a>&nbsp;
<a href="https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en"><img src='https://img.shields.io/badge/License-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'></a>&nbsp;


**MagicTryOn** is a video virtual try-on framework based on a large-scale video diffusion Transformer. ***1) It adopts Wan2.1 diffusion Transformer as the backbone*** and ***2) employs full self-attention to model spatiotemporal consistency***. ***3) A coarse-to-fine garment preservation strategy is introduced, along with a mask-aware loss to enhance garment region fidelity***.

## πŸ“£ News 
- **`2025/06/09`**: πŸŽ‰ We are excited to announce that the ***code*** of [**MagicTryOn**](https://github.com/vivoCameraResearch/Magic-TryOn/) have been released! Check it out! ***The weights are released!!!***. You can download the weights from πŸ€—[**HuggingFace**](https://huggingface.co/LuckyLiGY/MagicTryOn).
- **`2025/05/27`**: Our [**Paper on ArXiv**](https://arxiv.org/abs/2505.21325v2) is available πŸ₯³!

## βœ… To-Do List for MagicTryOn Release
- βœ… Release the source code
- βœ… Release the inference demo and pretrained weights
- βœ… Release the customized try-on utilities
- [  ] Release the testing scripts
- [  ] Release the training scripts
- [  ] Release the second version of the pretrained model weights 
- [  ] Update Gradio App. 

## 😍 Installation

Create a conda environment & Install requirments 
```shell
# python==3.12.9 cuda==12.3 torch==2.2
conda create -n magictryon python==3.12.9
conda activate magictryon
pip install -r requirements.txt
# or
conda env create -f environment.yaml
```
If you encounter an error while installing Flash Attention, please [**manually download**](https://github.com/Dao-AILab/flash-attention/releases) the installation package based on your Python version, CUDA version, and Torch version, and install it using `pip install flash_attn-2.7.3+cu12torch2.2cxx11abiFALSE-cp312-cp312-linux_x86_64.whl`.

Use the following command to download the weights:
```PowerShell
cd Magic-TryOn
HF_ENDPOINT=https://hf-mirror.com huggingface-cli download LuckyLiGY/MagicTryOn --local-dir ./weights/MagicTryOn_14B_V1
```

## πŸ˜‰ Demo Inference
### 1. Image TryOn
You can directly run the following command to perform image try-on demo. If you want to modify some inference parameters, please make the changes inside the `predict_image_tryon_up.py` file.
```PowerShell
CUDA_VISIBLE_DEVICES=0 python predict_image_tryon_up.py

CUDA_VISIBLE_DEVICES=1 python predict_image_tryon_low.py
```

### 2. Video TryOn
You can directly run the following command to perform image try-on demo. If you want to modify some inference parameters, please make the changes inside the `predict_video_tryon_up.py` file.
```PowerShell
CUDA_VISIBLE_DEVICES=0 python predict_video_tryon_up.py

CUDA_VISIBLE_DEVICES=1 python predict_video_tryon_low.py
```

### 3. Customize TryOn
Before performing customized try-on, you need to complete the following five steps to obtain:

1. **Cloth Caption**  
   Generate a descriptive caption for the garment, which may be used for conditioning or multimodal control. We use [**Qwen/Qwen2.5-VL-7B-Instruct**](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) to obtain the caption. Before running, you need to specify the folder path.
   ```PowerShell
   python inference/customize/get_garment_caption.py
    ```

2. **Cloth Line Map**  
   Extract the structural lines or sketch of the garment using [**AniLines-Anime-Lineart-Extractor**](https://github.com/zhenglinpan/AniLines-Anime-Lineart-Extractor). Download the pre-trained models from this [**link**](https://drive.google.com/file/d/1oazs4_X1Hppj-k9uqPD0HXWHEQLb9tNR/view?usp=sharing) and put them in the `inference/customize/AniLines/weights` folder.
   ```PowerShell
    cd inference/customize/AniLines
    python infer.py --dir_in datasets/garment/vivo/vivo_garment --dir_out datasets/garment/vivo/vivo_garment_anilines --mode detail --binarize -1 --fp16 True --device cuda:1
    ```

3. **Mask**  
   Generate the agnostic mask of the garment, which is essential for region control during try-on. Please [**download**](https://drive.google.com/file/d/1E2JC_650g69AYrN2ZCwc8oz8qYRo5t5s/view?usp=sharing) the required checkpoint for obtaining the agnostic mask. The checkpoint needs to be placed in the `inference/customize/gen_mask/ckpt` folder.

   (1) You need to rename your video to `video.mp4`, and then construct the folders according to the following directory structure.
    ```
    β”œβ”€β”€ datasets
    β”‚   β”œβ”€β”€ person
    |   |   β”œβ”€β”€ customize
    β”‚   β”‚   β”‚   β”œβ”€β”€ video
    β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 00001
    β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ video.mp4
    |   |   |   |   β”œβ”€β”€ 00002 ...
    β”‚   β”‚   β”‚   β”œβ”€β”€ image
    β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 00001
    β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ images
    β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 0000.png
    |   |   |   |   β”œβ”€β”€ 00002 ...
    ```

    (2) Using `video2image.py` to convert the video into image frames and save them to `datasets/person/customize/video/00001/images`.

    (3) Run the following command to obtain the agnostic mask.

    ```PowerShell
    cd inference/customize/gen_mask
    python app_mask.py
    # if extract the mask for lower_body or dresses, please modify line 65.
    # if lower_body:
    # mask, _ = get_mask_location('dc', "lower_body", model_parse, keypoints)
    # if dresses:
    # mask, _ = get_mask_location('dc', "dresses", model_parse, keypoints)
    ```

    After completing the above steps, you will obtain the agnostic masks for all video frames in the `datasets/person/customize/video/00001/masks` folder.
4. **Agnostic Representation**  
   Construct an agnostic representation of the person by removing garment-specific features. You can directly run `get_masked_person.py` to obtain the Agnostic Representation. Make sure to modify the `--image_folder` and `--mask_folder` parameters. The resulting video frames will be stored in `datasets/person/customize/video/00001/agnostic`.

5. **DensePose**  
   Use DensePose to obtain UV-mapped dense human body coordinates for better spatial alignment.

   (1) Install [**detectron2**](https://github.com/facebookresearch/detectron2).

   (2) Run the following command:
   ```PowerShell
    cd inference/customize/detectron2/projects/DensePose
    bash run.sh
    ```
    (3) The generated results will be stored in the `datasets/person/customize/video/00001/image-densepose` folder.

After completing the above steps, run the `image2video.py` file to generate the required customized videos: `mask.mp4`, `agnostic.mp4`, and `densepose.mp4`. Then, run the following command:
```PowerShell
CUDA_VISIBLE_DEVICES=0 python predict_video_tryon_customize.py
```

## 😘 Acknowledgement
Our code is modified based on [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun/tree/main). We adopt [Wan2.1-I2V-14B](https://github.com/Wan-Video/Wan2.1) as the base model. We use [SCHP](https://github.com/GoGoDuck912/Self-Correction-Human-Parsing/tree/master), [openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose), and [DensePose](https://github.com/facebookresearch/DensePose) to generate masks. We use [detectron2](https://github.com/facebookresearch/detectron2) to generate densepose. Thanks to all the contributors!

## 😊 License
All the materials, including code, checkpoints, and demo, are made available under the [Creative Commons BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. You are free to copy, redistribute, remix, transform, and build upon the project for non-commercial purposes, as long as you give appropriate credit and distribute your contributions under the same license.

## 🀩 Citation

```bibtex
@misc{li2025magictryon,
      title={MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on}, 
      author={Guangyuan Li and Siming Zheng and Hao Zhang and Jinwei Chen and Junsheng Luan and Binkai Ou and Lei Zhao and Bo Li and Peng-Tao Jiang},
      year={2025},
      eprint={2505.21325},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.21325}, 
}
```