Key highlights driving our model's exceptional speed and fidelity include:

Parallel DiT architecture with 16-channel VAE, enabling maximum garment fidelity and fine detail retention
CFG-augmented consistency distillation, allowing generation in as few as 8 inference steps without needing CFG during inference
Enhanced dilated clothing-agnostic masking strategy, resulting in more accurate garment outlines
Trained on 1M+ proprietary garment-image pairs, spanning flatlays, model-to-model scenarios, and a wide diversity of garment types

^*Please note that “1 second” refers to the try-on generation’s actual processing time. In pactice, additional pre-processing is applied to the model image with computing intermediate representations, and it may add to the overall latency.
However, these pre-processing overheads can be largely optimized with caching, and deliver a near real-time experience in applications.