| | --- |
| | license: mit |
| | library_name: diffusers |
| | pipeline_tag: image-to-image |
| | --- |
| | |
| | --- |
| | license: mit |
| | library_name: diffusers |
| | pipeline_tag: image-to-image |
| | --- |
| |
|
| | # REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers |
| |
|
| | ## About |
| | This model addresses the question of whether latent diffusion models and their VAE tokenizer can be trained end-to-end. Using a representation-alignment (REPA) loss, REPA-E enables stable and effective joint training of both components, leading to significant training acceleration and improved VAE performance. The resulting E2E-VAE serves as a drop-in replacement for existing VAEs, improving convergence and generation quality across diverse LDM architectures. |
| |
|
| | This model is based on the paper [REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers](https://huggingface.co/papers/2504.10483) and its official implementation is available on [Github](https://github.com/REPA-E/REPA-E). The project page can be found at [https://end2end-diffusion.github.io](https://end2end-diffusion.github.io). |
| |
|
| | ## Usage |
| |
|
| | To use the REPA-E model, you can load it via the Hugging Face `DiffusionPipeline`. Below is a simplified example of how to use a pretrained REPA-E model for inference. For training examples and further details, please refer to the [Github repository](https://github.com/REPA-E/REPA-E). |
| |
|
| | ```python |
| | from diffusers import DiffusionPipeline |
| | |
| | pipeline = DiffusionPipeline.from_pretrained("REPA-E/sit-repae-sdvae", trust_remote_code=True) |
| | image = pipeline().images[0] |
| | |
| | image.save("generated_image.png") |
| | ``` |