File size: 7,318 Bytes
ee60a3c 0c873fc 5402766 caf73f1 0ab275d 5402766 71e66aa 5402766 0ab275d 0c873fc 0ab275d 0c873fc 5402766 7e29c80 0ab275d 5402766 0c873fc 7e29c80 0c873fc 5402766 0c873fc 3ad8d51 5402766 0c873fc 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 0ab275d 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 2fe123e 0c873fc 0ab275d 5402766 0ab275d 0c873fc 5402766 7e29c80 0ab275d 5402766 0ab275d 0c873fc 0ab275d 0c873fc 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 0ab275d 0c873fc 5402766 0ab275d 0c873fc 0ab275d 5402766 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0c873fc 5402766 0ab275d 5402766 0ab275d 0c873fc 0ab275d 0c873fc 0ab275d 0c873fc 5402766 0ab275d 0c873fc d0ca0c9 0ab275d 5402766 0ab275d 0c873fc 5402766 7e29c80 0ab275d 0c873fc 5402766 0ab275d 0c873fc 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 0ab275d 5402766 0ab275d 0c873fc 0ab275d 0c873fc 0ab275d 0c873fc 0ab275d 0c873fc 0ab275d 0c873fc 0ab275d 5402766 0c873fc 0ab275d 0c873fc 0ab275d 0c873fc 0ab275d 0c873fc 0ab275d 5402766 0c873fc 5402766 0c873fc 5402766 0c873fc 5402766 0c873fc 5402766 0c873fc 5402766 0ab275d 0c873fc 5402766 0ab275d 0c873fc 0ab275d 0c873fc f6c80ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 |
---
license: apache-2.0
language:
- en
pipeline_tag: image-to-image
---
# person-to-person Try on with Additional Unpaired Visual Reference
[](https://huggingface.co/qihoo360/RefVTON) [](https://arxiv.org/abs/2511.00956)

We propose **REFVTON**, an End-to-End Virtual Try-on model with Additional Visual Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance the model's ability to preserve and accurately depict clothing details.
## 💡 Github
[REFVTON](https://github.com/360CVGroup/REFVTON)
## 💡 Pretrained Models
We provide pretrained backbone networks and LoRA weights for testing and deployment. Please download the `.safetensors` files from [here] and place them in the `checkpoints` directory.
512_384_pytorch_lora_weights.safetensors:512x384 resolution high-quality virtual fitting model
✅ **Available**
1024_768_pytorch_lora_weights.safetensors:1024x768 resolution high-quality virtual fitting model
✅ **Available**
## 💡 Update
- [x] [2025.10.11] Release the virtual try-on inference code and LoRA weights.
- [x] [2025.10.13] Release the technical report on Arxiv.
## 💪 Highlight Feature
- **And End-To-End virtual try-on model:** Can function either as an inpainting model for placing the target clothing into masked areas, or as a direct garment transfer onto the human body.
- **Using Reference Image To Enhance the Try-on Performance:** To emulate human attention on the overall wearing effect rather than the garment itself when shopping online, our model allows using images of a model wearing the target clothing as input, thereby better preserving its material texture and design details.
- **Improved Performance** Our model achieves state-of-the-art performance on public benchmarks and demonstrates strong generalization ability to in-the-wild inputs.
## 🧩 Environment Setup
```
conda create -n REFVTON python=3.12 -y
conda activate REFVTON
pip install -r requirements.txt
```
## 📂 Preparation of Dataset and Pretrained Models
### Dataset
Currently, we provide a small test set with additional reference images "difference person wearing the target cloth" for trying our model. We plan to release the reference data generation code, along with our proposed full dataset containing model reference images, in the future.
Nevertheless, inference can still be performed in a reference-free setting on public benchmarks, including [VITON-HD](https://github.com/shadow2496/VITON-HD) and [DressCode](https://github.com/aimagelab/dress-code).
### Reference Data Preparation
One key feature of our method is the use of _reference data_, where an image of a different person wearing the target garment is provided to help the model imagine how the target person would look in that garment. In most online shopping applications, such additonal reference images are commonly used by customers to better visualize the clothing. However, publicly available datasets such as VITON-HD and DressCode do not include such reference data, so we generate them ourselves.
Please prepare the pretrained weights of the Flux-Kontext model and the Qwen2.5-VL-32B model. And you can generate the additonal reference image using the following commands:
```
accelerate launch --num_processes 8 --main_process_port 29500 generate_reference.py \
--instance_data_dir "path_to_your_datasets" \
--inference_batch_size 1 \
--split "train" \
--desc_path "desc.json"
```
### Pretrained Models
We provide pretrained backbone networks and LoRA weights for testing and deployment. Please download the `.safetensors` files from [here] and place them in the `checkpoints` directory.
## ⏳ Inference Pipeline
Here we provide the inference code for our REFVTON.
```
accelerate launch --num_processes 8 --main_process_port 29500 inference.py \
--pretrained_model_name_or_path="[path_to_your_Flux_model]" \
--instance_data_dir="[your_data_directory]" \
--output_dir="[Path_to_LoRA_weights]" \
--mixed_precision="bf16" \
--split="test" \
--height=1024 \
--width=768 \
--inference_batch_size=1 \
--cond_scale=2 \
--seed="0" \
--use_reference \
--use_different \
--use_person
```
- `pretrained_model_name_or_path`: Path to the downloaded Flux-Kontext model weights.
- `instance_data_dir`: Path to your dataset. For inference on VITON-HD or DressCode, ensure that the words "viton" or "DressCode" appear in the path.
- `output_dir`: Path to the downloaded or trained LoRA weights.
- `cond_scale`: Resize scale of the reference image during training. Defaults to `1.0` for $512\times384$ and `2.0` for $1024\times768$ resolution.
- `use_reference`: Whether to use a additonal reference image as input.
- `use_different`: **Only applicable for VITON/DressCode inference.** Whether to use different cloth-person pairs.
- `use_person`: **Only applicable for VITON/DressCode inference.** Whether to use the unmasked person image instead of the agnostic masked image as input for the virtual try-on task.
## 📊 Evaluation
We quantitatively evaluate the quality of virtual try-on results using the FID, KID, SSIM, and LPIPS. Here, we provide the evaluation code for the VITON-HD and DressCode datasets.
```
# Evaluation on VITON-HD dataset
CUDA_VISIBLE_DEVICES=0 python eval_dresscode.py \
--gt_folder_base [path_to_your_ground_truth_image_folder] \
--pred_folder_base [[path_to_your_generated_image_folder]]\
--paired
```
```
# Evaluation on DressCode dataset
CUDA_VISIBLE_DEVICES=0 python eval.py \
--gt_folder_base [path_to_your_ground_truth_image_folder] \
--pred_folder_base [[path_to_your_generated_image_folder]]\
```
- `paired`: If you perform unpaired generation, where different garments are fitted onto the target person, you should enable this flag during evaluation.
Evaluation result on VITON-HD dataset:

Evaluation result on DressCode dataset:

## 🌸 Acknowledgement
This code is mainly built upon [Diffusers](https://github.com/huggingface/diffusers/tree/main), [Flux](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/flux), and [CatVTON](https://github.com/Zheng-Chong/CatVTON/) repositories. Thanks so much for their solid work!
## 💖 Citation
If you find this repository useful, please consider citing our paper:
```
@misc{li2025evtarendtoendtryadditional,
title={EVTAR: End-to-End Try on with Additional Unpaired Visual Reference},
author={Liuzhuozheng Li and Yue Gong and Shanyuan Liu and Bo Cheng and Yuhang Ma and Liebucha Wu and Dengyang Jiang and Zanyi Wang and Dawei Leng and Yuhui Yin},
year={2025},
eprint={2511.00956},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.00956},
}
``` |