File size: 2,321 Bytes

afe1ef4
7a0299c
afe1ef4
33b9c82
 
 
 
afe1ef4
33b9c82
 
 
 
afe1ef4
33b9c82
 
 
afe1ef4
 
 
 
788efaa
afe1ef4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a0299c
afe1ef4
 
7a0299c
 
afe1ef4

---
license: cc-by-nc-4.0
tags:
- virtual-try-on
- diffusers
- stable-diffusion
- image-to-image
datasets:
- VITON-HD
- DressCode
base_model:
- stable-diffusion-v1-5/stable-diffusion-inpainting
pipeline_tag: image-to-image
language:
- en
library_name: diffusers
---

# Re-CatVTON

Official model weights for **"Rethinking Garment Conditioning in Diffusion-based Virtual Try-On"**.

📄 **Paper**: [Re-CatVTON](https://arxiv.org/abs/2511.18775)  
💻 **Code**: [GitHub](https://github.com/Levinna/Re-CatVTON)

## Available Checkpoints

| Dataset | Subfolder | Resolution |
|---------|-----------|------------|
| VITON-HD | `VITON-HD/checkpoint-16000/unet` | 512×384 |
| DressCode | `DressCode/checkpoint-32000/unet` | 512×384 |

## Usage
```python
import torch
from diffusers import AutoencoderKL, UNet2DConditionModel, DDPMScheduler
from model.pipeline import RECATVTONPipeline
from model.attn_processor import SkipAttnProcessor
from model.utils import init_adapter

device = "cuda"
dtype = torch.bfloat16

# Load components
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse").to(device, dtype)

# Choose one:
unet = UNet2DConditionModel.from_pretrained(
    "levinna/Re-CatVTON", 
    subfolder="VITON-HD/checkpoint-16000/unet"  # or "DressCode/checkpoint-32000/unet"
).to(device, dtype)

scheduler = DDPMScheduler.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-inpainting", # or can use Re-CatVTON scheduler config
    subfolder="scheduler"
)

# Initialize attention processors (disable cross-attention)
init_adapter(unet, cross_attn_cls=SkipAttnProcessor)

# Create pipeline
pipeline = RECATVTONPipeline(vae=vae, unet=unet, scheduler=scheduler)
```
You can check more detailed instructions on Official [GitHub](https://github.com/Levinna/Re-CatVTON)

## License
This model is licensed under CC BY-NC 4.0 due to the usage of non-commercial datasets (VITON-HD, DressCode).
- **Model Weights**: [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)
- **Code**: [CC-BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)

## Citation
```bibtex
@article{na2025rethinking,
  title={Rethinking Garment Conditioning in Diffusion-based Virtual Try-On},
  author={Na, Kihyun and Choi, Jinyoung and Kim, Injung},
  journal={arXiv preprint arXiv:2511.18775},
  year={2025}
}
```