Image-to-Image
English
nielsr HF Staff commited on
Commit
2fe8101
·
verified ·
1 Parent(s): ee60a3c

Add `library_name` metadata tag

Browse files

This PR enhances the model card by adding the `library_name: diffusers` metadata tag. This will enable the automatic "How to use" widget on the Hugging Face Hub, providing users with a ready-to-run code snippet for the model, as it is compatible with the `diffusers` library.

Files changed (1) hide show
  1. README.md +13 -204
README.md CHANGED
@@ -1,37 +1,22 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
 
5
  pipeline_tag: image-to-image
 
6
  ---
7
 
8
-
9
-
10
  # End2End Virtual Tryon with Visual Reference
11
 
12
-
13
-
14
  [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-EVTAR-ff9900?style=flat)](https://huggingface.co/qihoo360/EVTAR) [![arXiv](https://img.shields.io/badge/arXiv-2101.00001-B31B1B?style=flat)](https://arxiv.org/abs/2511.00956)
15
 
16
-
17
-
18
  ![examples](examples.png)
19
 
20
-
21
-
22
-
23
-
24
  We propose **EVTAR**, an End-to-End Virtual Try-on model with Additional Visual Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance the model's ability to preserve and accurately depict clothing details.
25
 
26
-
27
-
28
-
29
  ## 💡 Github
30
  [EVTAR](https://github.com/360CVGroup/EVTAR)
31
 
32
-
33
-
34
-
35
  ## 💡 Pretrained Models
36
 
37
  We provide pretrained backbone networks and LoRA weights for testing and deployment. Please download the `.safetensors` files from [here] and place them in the `checkpoints` directory.
@@ -41,126 +26,43 @@ We provide pretrained backbone networks and LoRA weights for testing and deploym
41
 
42
  1024_768_pytorch_lora_weights.safetensors:1024x768 resolution high-quality virtual fitting model
43
  ✅ **Available**
44
-
45
-
46
-
47
 
48
  ## 💡 Update
49
 
50
-
51
-
52
-
53
-
54
  - [x] [2025.10.11] Release the virtual try-on inference code and LoRA weights.
55
 
56
-
57
-
58
-
59
-
60
  - [x] [2025.10.13] Release the technical report on Arxiv.
61
 
62
-
63
-
64
-
65
-
66
-
67
-
68
  ## 💪 Highlight Feature
69
 
70
-
71
-
72
-
73
-
74
-
75
-
76
- - **And End-To-End virtual try-on model:** Can function either as an inpainting model for placing the target clothing into masked areas, or as a direct garment transfer onto the human body.
77
-
78
-
79
-
80
-
81
 
82
- - **Using Reference Image To Enhance the Try-on Performance:** To emulate human attention on the overall wearing effect rather than the garment itself when shopping online, our model allows using images of a model wearing the target clothing as input, thereby better preserving its material texture and design details.
83
 
84
-
85
-
86
-
87
-
88
- - **Improved Performance** Our model achieves state-of-the-art performance on public benchmarks and demonstrates strong generalization ability to in-the-wild inputs.
89
-
90
-
91
-
92
-
93
-
94
-
95
 
96
  ## 🧩 Environment Setup
97
 
98
-
99
-
100
-
101
-
102
-
103
-
104
  ```
105
  conda create -n EVTAR python=3.12 -y
106
  conda activate EVTAR
107
  pip install -r requirements.txt
108
  ```
109
 
110
-
111
-
112
-
113
-
114
-
115
-
116
  ## 📂 Preparation of Dataset and Pretrained Models
117
 
118
-
119
-
120
-
121
-
122
-
123
-
124
  ### Dataset
125
 
126
-
127
-
128
-
129
-
130
-
131
-
132
  Currently, we provide a small test set with additional reference images "difference person wearing the target cloth" for trying our model. We plan to release the reference data generation code, along with our proposed full dataset containing model reference images, in the future.
133
 
134
-
135
-
136
-
137
-
138
  Nevertheless, inference can still be performed in a reference-free setting on public benchmarks, including [VITON-HD](https://github.com/shadow2496/VITON-HD) and [DressCode](https://github.com/aimagelab/dress-code).
139
 
140
-
141
-
142
-
143
-
144
  ### Reference Data Preparation
145
 
146
-
147
-
148
-
149
-
150
  One key feature of our method is the use of _reference data_, where an image of a different person wearing the target garment is provided to help the model imagine how the target person would look in that garment. In most online shopping applications, such additonal reference images are commonly used by customers to better visualize the clothing. However, publicly available datasets such as VITON-HD and DressCode do not include such reference data, so we generate them ourselves.
151
 
152
-
153
-
154
-
155
-
156
-
157
-
158
  Please prepare the pretrained weights of the Flux-Kontext model and the Qwen2.5-VL-32B model. And you can generate the additonal reference image using the following commands:
159
 
160
-
161
-
162
-
163
-
164
  ```
165
  accelerate launch --num_processes 8 --main_process_port 29500 generate_reference.py \
166
  --instance_data_dir "path_to_your_datasets" \
@@ -169,40 +71,14 @@ accelerate launch --num_processes 8 --main_process_port 29500 generate_reference
169
  --desc_path "desc.json"
170
  ```
171
 
172
-
173
-
174
-
175
-
176
-
177
-
178
  ### Pretrained Models
179
 
180
-
181
-
182
-
183
-
184
  We provide pretrained backbone networks and LoRA weights for testing and deployment. Please download the `.safetensors` files from [here] and place them in the `checkpoints` directory.
185
 
186
-
187
-
188
-
189
-
190
-
191
-
192
  ## ⏳ Inference Pipeline
193
 
194
-
195
-
196
-
197
-
198
-
199
-
200
  Here we provide the inference code for our EVTAR.
201
 
202
-
203
-
204
-
205
-
206
  ```
207
  accelerate launch --num_processes 8 --main_process_port 29500 inference.py \
208
 
@@ -221,62 +97,24 @@ accelerate launch --num_processes 8 --main_process_port 29500 inference.py \
221
  --use_person
222
  ```
223
 
224
-
225
-
226
-
227
-
228
- - `pretrained_model_name_or_path`: Path to the downloaded Flux-Kontext model weights.
229
-
230
-
231
-
232
-
233
-
234
- - `instance_data_dir`: Path to your dataset. For inference on VITON-HD or DressCode, ensure that the words "viton" or "DressCode" appear in the path.
235
-
236
-
237
-
238
-
239
-
240
- - `output_dir`: Path to the downloaded or trained LoRA weights.
241
-
242
-
243
 
244
-
245
 
246
- - `cond_scale`: Resize scale of the reference image during training. Defaults to `1.0` for $512\times384$ and `2.0` for $1024\times768$ resolution.
247
 
248
-
249
 
250
-
251
 
252
- - `use_reference`: Whether to use a additonal reference image as input.
253
 
254
-
255
-
256
-
257
-
258
- - `use_different`: **Only applicable for VITON/DressCode inference.** Whether to use different cloth-person pairs.
259
-
260
-
261
-
262
-
263
-
264
- - `use_person`: **Only applicable for VITON/DressCode inference.** Whether to use the unmasked person image instead of the agnostic masked image as input for the virtual try-on task.
265
-
266
-
267
-
268
-
269
 
270
  ## 📊 Evaluation
271
 
272
-
273
-
274
-
275
-
276
  We quantitatively evaluate the quality of virtual try-on results using the FID, KID, SSIM, and LPIPS. Here, we provide the evaluation code for the VITON-HD and DressCode datasets.
277
 
278
-
279
-
280
  ```
281
  # Evaluation on VITON-HD dataset
282
 
@@ -287,12 +125,6 @@ CUDA_VISIBLE_DEVICES=0 python eval_dresscode.py \
287
  --paired
288
  ```
289
 
290
-
291
-
292
-
293
-
294
-
295
-
296
  ```
297
  # Evaluation on DressCode dataset
298
 
@@ -302,45 +134,22 @@ CUDA_VISIBLE_DEVICES=0 python eval.py \
302
  --pred_folder_base [[path_to_your_generated_image_folder]]\
303
  ```
304
 
305
-
306
-
307
-
308
-
309
- - `paired`: If you perform unpaired generation, where different garments are fitted onto the target person, you should enable this flag during evaluation.
310
-
311
-
312
-
313
 
314
  Evaluation result on VITON-HD dataset:
315
 
316
  ![examples](VITON_results.png)
317
 
318
-
319
-
320
-
321
  Evaluation result on DressCode dataset:
322
 
323
  ![examples](DressCode_results.png)
324
 
325
-
326
-
327
  ## 🌸 Acknowledgement
328
 
329
-
330
-
331
-
332
-
333
  This code is mainly built upon [Diffusers](https://github.com/huggingface/diffusers/tree/main), [Flux](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/flux), and [CatVTON](https://github.com/Zheng-Chong/CatVTON/) repositories. Thanks so much for their solid work!
334
 
335
-
336
-
337
-
338
-
339
-
340
-
341
  ## 💖 Citation
342
 
343
-
344
  If you find this repository useful, please consider citing our paper:
345
  ```
346
  @misc{li2025evtarendtoendtryadditional,
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  pipeline_tag: image-to-image
6
+ library_name: diffusers
7
  ---
8
 
 
 
9
  # End2End Virtual Tryon with Visual Reference
10
 
 
 
11
  [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-EVTAR-ff9900?style=flat)](https://huggingface.co/qihoo360/EVTAR) [![arXiv](https://img.shields.io/badge/arXiv-2101.00001-B31B1B?style=flat)](https://arxiv.org/abs/2511.00956)
12
 
 
 
13
  ![examples](examples.png)
14
 
 
 
 
 
15
  We propose **EVTAR**, an End-to-End Virtual Try-on model with Additional Visual Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance the model's ability to preserve and accurately depict clothing details.
16
 
 
 
 
17
  ## 💡 Github
18
  [EVTAR](https://github.com/360CVGroup/EVTAR)
19
 
 
 
 
20
  ## 💡 Pretrained Models
21
 
22
  We provide pretrained backbone networks and LoRA weights for testing and deployment. Please download the `.safetensors` files from [here] and place them in the `checkpoints` directory.
 
26
 
27
  1024_768_pytorch_lora_weights.safetensors:1024x768 resolution high-quality virtual fitting model
28
  ✅ **Available**
 
 
 
29
 
30
  ## 💡 Update
31
 
 
 
 
 
32
  - [x] [2025.10.11] Release the virtual try-on inference code and LoRA weights.
33
 
 
 
 
 
34
  - [x] [2025.10.13] Release the technical report on Arxiv.
35
 
 
 
 
 
 
 
36
  ## 💪 Highlight Feature
37
 
38
+ - **And End-To-End virtual try-on model:** Can function either as an inpainting model for placing the target clothing into masked areas, or as a direct garment transfer onto the human body.
 
 
 
 
 
 
 
 
 
 
39
 
40
+ - **Using Reference Image To Enhance the Try-on Performance:** To emulate human attention on the overall wearing effect rather than the garment itself when shopping online, our model allows using images of a model wearing the target clothing as input, thereby better preserving its material texture and design details.
41
 
42
+ - **Improved Performance** Our model achieves state-of-the-art performance on public benchmarks and demonstrates strong generalization ability to in-the-wild inputs.
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## 🧩 Environment Setup
45
 
 
 
 
 
 
 
46
  ```
47
  conda create -n EVTAR python=3.12 -y
48
  conda activate EVTAR
49
  pip install -r requirements.txt
50
  ```
51
 
 
 
 
 
 
 
52
  ## 📂 Preparation of Dataset and Pretrained Models
53
 
 
 
 
 
 
 
54
  ### Dataset
55
 
 
 
 
 
 
 
56
  Currently, we provide a small test set with additional reference images "difference person wearing the target cloth" for trying our model. We plan to release the reference data generation code, along with our proposed full dataset containing model reference images, in the future.
57
 
 
 
 
 
58
  Nevertheless, inference can still be performed in a reference-free setting on public benchmarks, including [VITON-HD](https://github.com/shadow2496/VITON-HD) and [DressCode](https://github.com/aimagelab/dress-code).
59
 
 
 
 
 
60
  ### Reference Data Preparation
61
 
 
 
 
 
62
  One key feature of our method is the use of _reference data_, where an image of a different person wearing the target garment is provided to help the model imagine how the target person would look in that garment. In most online shopping applications, such additonal reference images are commonly used by customers to better visualize the clothing. However, publicly available datasets such as VITON-HD and DressCode do not include such reference data, so we generate them ourselves.
63
 
 
 
 
 
 
 
64
  Please prepare the pretrained weights of the Flux-Kontext model and the Qwen2.5-VL-32B model. And you can generate the additonal reference image using the following commands:
65
 
 
 
 
 
66
  ```
67
  accelerate launch --num_processes 8 --main_process_port 29500 generate_reference.py \
68
  --instance_data_dir "path_to_your_datasets" \
 
71
  --desc_path "desc.json"
72
  ```
73
 
 
 
 
 
 
 
74
  ### Pretrained Models
75
 
 
 
 
 
76
  We provide pretrained backbone networks and LoRA weights for testing and deployment. Please download the `.safetensors` files from [here] and place them in the `checkpoints` directory.
77
 
 
 
 
 
 
 
78
  ## ⏳ Inference Pipeline
79
 
 
 
 
 
 
 
80
  Here we provide the inference code for our EVTAR.
81
 
 
 
 
 
82
  ```
83
  accelerate launch --num_processes 8 --main_process_port 29500 inference.py \
84
 
 
97
  --use_person
98
  ```
99
 
100
+ - `pretrained_model_name_or_path`: Path to the downloaded Flux-Kontext model weights.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
+ - `instance_data_dir`: Path to your dataset. For inference on VITON-HD or DressCode, ensure that the words "viton" or "DressCode" appear in the path.
103
 
104
+ - `output_dir`: Path to the downloaded or trained LoRA weights.
105
 
106
+ - `cond_scale`: Resize scale of the reference image during training. Defaults to `1.0` for $512\times384$ and `2.0` for $1024\times768$ resolution.
107
 
108
+ - `use_reference`: Whether to use a additonal reference image as input.
109
 
110
+ - `use_different`: **Only applicable for VITON/DressCode inference.** Whether to use different cloth-person pairs.
111
 
112
+ - `use_person`: **Only applicable for VITON/DressCode inference.** Whether to use the unmasked person image instead of the agnostic masked image as input for the virtual try-on task.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
  ## 📊 Evaluation
115
 
 
 
 
 
116
  We quantitatively evaluate the quality of virtual try-on results using the FID, KID, SSIM, and LPIPS. Here, we provide the evaluation code for the VITON-HD and DressCode datasets.
117
 
 
 
118
  ```
119
  # Evaluation on VITON-HD dataset
120
 
 
125
  --paired
126
  ```
127
 
 
 
 
 
 
 
128
  ```
129
  # Evaluation on DressCode dataset
130
 
 
134
  --pred_folder_base [[path_to_your_generated_image_folder]]\
135
  ```
136
 
137
+ - `paired`: If you perform unpaired generation, where different garments are fitted onto the target person, you should enable this flag during evaluation.
 
 
 
 
 
 
 
138
 
139
  Evaluation result on VITON-HD dataset:
140
 
141
  ![examples](VITON_results.png)
142
 
 
 
 
143
  Evaluation result on DressCode dataset:
144
 
145
  ![examples](DressCode_results.png)
146
 
 
 
147
  ## 🌸 Acknowledgement
148
 
 
 
 
 
149
  This code is mainly built upon [Diffusers](https://github.com/huggingface/diffusers/tree/main), [Flux](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/flux), and [CatVTON](https://github.com/Zheng-Chong/CatVTON/) repositories. Thanks so much for their solid work!
150
 
 
 
 
 
 
 
151
  ## 💖 Citation
152
 
 
153
  If you find this repository useful, please consider citing our paper:
154
  ```
155
  @misc{li2025evtarendtoendtryadditional,