nielsr HF Staff

Add pipeline tag and library name

9e9f594 verified about 1 year ago

2.92 kB

license: mit
pipeline_tag: image-to-image
library_name: diffusers

REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

Xingjian Leng^1* · Jaskirat Singh^1* · Yunzhong Hou¹ · Zhenchang Xing² · Saining Xie³ · Liang Zheng¹

¹ Australian National University ²Data61-CSIRO ³New York University
_{^*Project Leads}

🌐 Project Page 🤗 Models 📃 Paper

Overview

We address a fundamental question: Can latent diffusion models and their VAE tokenizer be trained end-to-end? While training both components jointly with standard diffusion loss is observed to be ineffective — often degrading final performance — we show that this limitation can be overcome using a simple representation-alignment (REPA) loss. Our proposed method, REPA-E, enables stable and effective joint training of both the VAE and the diffusion model.

REPA-E significantly accelerates training — achieving over 17× speedup compared to REPA and 45× over the vanilla training recipe. Interestingly, end-to-end tuning also improves the VAE itself: the resulting E2E-VAE provides better latent structure and serves as a drop-in replacement for existing VAEs (e.g., SD-VAE), improving convergence and generation quality across diverse LDM architectures. Our method achieves state-of-the-art FID scores on ImageNet 256×256: 1.26 with CFG and 1.83 without CFG.

News and Updates

[2025-04-15] Initial Release with pre-trained models and codebase. ... (rest of the content remains unchanged)