levinna
/

Re-CatVTON

stable-diffusion

Model card Files Files and versions

Re-CatVTON / README.md

levinna's picture

Update README.md

7a0299c verified 12 days ago

|

history blame contribute delete

2.32 kB

	---
	license: cc-by-nc-4.0
	tags:
	- virtual-try-on
	- diffusers
	- stable-diffusion
	- image-to-image
	datasets:
	- VITON-HD
	- DressCode
	base_model:
	- stable-diffusion-v1-5/stable-diffusion-inpainting
	pipeline_tag: image-to-image
	language:
	- en
	library_name: diffusers
	---

	# Re-CatVTON

	Official model weights for "Rethinking Garment Conditioning in Diffusion-based Virtual Try-On".

	📄 Paper: [Re-CatVTON](https://arxiv.org/abs/2511.18775)
	💻 Code: [GitHub](https://github.com/Levinna/Re-CatVTON)

	## Available Checkpoints

	\| Dataset \| Subfolder \| Resolution \|
	\|---------\|-----------\|------------\|
	\| VITON-HD \| `VITON-HD/checkpoint-16000/unet` \| 512×384 \|
	\| DressCode \| `DressCode/checkpoint-32000/unet` \| 512×384 \|

	## Usage
	```python
	import torch
	from diffusers import AutoencoderKL, UNet2DConditionModel, DDPMScheduler
	from model.pipeline import RECATVTONPipeline
	from model.attn_processor import SkipAttnProcessor
	from model.utils import init_adapter

	device = "cuda"
	dtype = torch.bfloat16

	# Load components
	vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse").to(device, dtype)

	# Choose one:
	unet = UNet2DConditionModel.from_pretrained(
	"levinna/Re-CatVTON",
	subfolder="VITON-HD/checkpoint-16000/unet" # or "DressCode/checkpoint-32000/unet"
	).to(device, dtype)

	scheduler = DDPMScheduler.from_pretrained(
	"stable-diffusion-v1-5/stable-diffusion-inpainting", # or can use Re-CatVTON scheduler config
	subfolder="scheduler"
	)

	# Initialize attention processors (disable cross-attention)
	init_adapter(unet, cross_attn_cls=SkipAttnProcessor)

	# Create pipeline
	pipeline = RECATVTONPipeline(vae=vae, unet=unet, scheduler=scheduler)
	```
	You can check more detailed instructions on Official [GitHub](https://github.com/Levinna/Re-CatVTON)

	## License
	This model is licensed under CC BY-NC 4.0 due to the usage of non-commercial datasets (VITON-HD, DressCode).
	- Model Weights: [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)
	- Code: [CC-BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)

	## Citation
	```bibtex
	@article{na2025rethinking,
	title={Rethinking Garment Conditioning in Diffusion-based Virtual Try-On},
	author={Na, Kihyun and Choi, Jinyoung and Kim, Injung},
	journal={arXiv preprint arXiv:2511.18775},
	year={2025}
	}
	```