Overview

JCo-MVTON introduces a novel framework for mask-free virtual try-on based on MM-DiT that addresses key limitations of existing systems: rigid dependencies on human body masks, limited fine-grained control over garment attributes, and poor generalization to in-the-wild scenarios.

Overview

Basic Usage

# Load transformer with additional branches

transformer = FluxTransformer2DModel.from_pretrained(
model_id,
torch_dtype=torch_dtype,
subfolder="transformer",
extra_branch_num=extra_branch_num,
low_cpu_mem_usage=False,
).to(device)

# Load and preprocess images

person = Image.open('assets/ref.jpg').convert("RGB").resize((width, height))
cloth = Image.open('assets/upper.jpg').convert("RGB").resize((height, height))

person_tensor = transform_person(person)
cloth_tensor = transform_cloth(cloth)

prompt = "A fashion model wearing stylish clothing, high-resolution 8k, detailed textures, realistic lighting, fashion photography style."

# Generate image

with torch.inference_mode():
generated_image = pipe(
generator=torch.Generator(device="cpu").manual_seed(seed),
prompt=prompt,
num_inference_steps=n_steps,
guidance_scale=guidance_scale,
height=height,
width=width,
cloth_img=cloth_tensor,
person_img=person_tensor,
extra_branch_num=extra_branch_num,
mode=mode,
max_sequence_length=77,
).images[0]

# Save result

person_tensor = transform_output(person)
cloth_tensor = transform_output(cloth)
generated_tensor = transform_output(generated_image)

concatenated_tensor = torch.cat((cloth_tensor, person_tensor, generated_tensor), dim=2)
vutils.save_image(concatenated_tensor, 'output.png')

Results

JCo-MVTON achieves state-of-the-art performance across multiple metrics:

Methods Paired Paired Paired Paired Unpaired Unpaired
SSIM โ†‘ FID โ†“ KID โ†“ LPIPS โ†“ FID โ†“ KID โ†“
MV-VTON (Wang et al., 2025b) 0.8083 15.442 7.501 0.1171 17.900 3.861
OOTDiffusion (Xu et al., 2024) 0.8187 9.305 4.086 0.0876 12.408 4.689
JCo-MVTON (Ours) 0.8601 8.103 2.003 0.0891 9.561 2.700

Citation

If you find our work useful, please cite:

@article{wang2024jco,
  title={JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on},
  author={Wang, Aowen and Li, Wei and Luo, Hao and Ao, Mengxing and Wang, Fan},
  journal={arXiv preprint arXiv:xxxx.xxxxx},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support