Overview

JCo-MVTON introduces a novel framework for mask-free virtual try-on based on MM-DiT that addresses key limitations of existing systems: rigid dependencies on human body masks, limited fine-grained control over garment attributes, and poor generalization to in-the-wild scenarios.

Basic Usage

# Load transformer with additional branches

transformer = FluxTransformer2DModel.from_pretrained(
model_id,
torch_dtype=torch_dtype,
subfolder="transformer",
extra_branch_num=extra_branch_num,
low_cpu_mem_usage=False,
).to(device)

# Load and preprocess images

person = Image.open('assets/ref.jpg').convert("RGB").resize((width, height))
cloth = Image.open('assets/upper.jpg').convert("RGB").resize((height, height))

person_tensor = transform_person(person)
cloth_tensor = transform_cloth(cloth)

prompt = "A fashion model wearing stylish clothing, high-resolution 8k, detailed textures, realistic lighting, fashion photography style."

# Generate image

with torch.inference_mode():
generated_image = pipe(
generator=torch.Generator(device="cpu").manual_seed(seed),
prompt=prompt,
num_inference_steps=n_steps,
guidance_scale=guidance_scale,
height=height,
width=width,
cloth_img=cloth_tensor,
person_img=person_tensor,
extra_branch_num=extra_branch_num,
mode=mode,
max_sequence_length=77,
).images[0]

# Save result

person_tensor = transform_output(person)
cloth_tensor = transform_output(cloth)
generated_tensor = transform_output(generated_image)

concatenated_tensor = torch.cat((cloth_tensor, person_tensor, generated_tensor), dim=2)
vutils.save_image(concatenated_tensor, 'output.png')

Results

JCo-MVTON achieves state-of-the-art performance across multiple metrics:

Methods	Paired	Paired	Paired	Paired	Unpaired	Unpaired
	SSIM ↑	FID ↓	KID ↓	LPIPS ↓	FID ↓	KID ↓
MV-VTON (Wang et al., 2025b)	0.8083	15.442	7.501	0.1171	17.900	3.861
OOTDiffusion (Xu et al., 2024)	0.8187	9.305	4.086	0.0876	12.408	4.689
JCo-MVTON (Ours)	0.8601	8.103	2.003	0.0891	9.561	2.700

Citation

If you find our work useful, please cite:

@article{wang2024jco,
  title={JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on},
  author={Wang, Aowen and Li, Wei and Luo, Hao and Ao, Mengxing and Wang, Fan},
  journal={arXiv preprint arXiv:xxxx.xxxxx},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support