license: mit
library_name: diffusers
pipeline_tag: image-to-image
license: mit library_name: diffusers pipeline_tag: image-to-image
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
About
This model addresses the question of whether latent diffusion models and their VAE tokenizer can be trained end-to-end. Using a representation-alignment (REPA) loss, REPA-E enables stable and effective joint training of both components, leading to significant training acceleration and improved VAE performance. The resulting E2E-VAE serves as a drop-in replacement for existing VAEs, improving convergence and generation quality across diverse LDM architectures.
This model is based on the paper REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers and its official implementation is available on Github. The project page can be found at https://end2end-diffusion.github.io.
Usage
To use the REPA-E model, you can load it via the Hugging Face DiffusionPipeline. Below is a simplified example of how to use a pretrained REPA-E model for inference. For training examples and further details, please refer to the Github repository.
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("REPA-E/sit-repae-sdvae", trust_remote_code=True)
image = pipeline().images[0]
image.save("generated_image.png")