Spaces:

REPA-E
/

README

Running

App Files Files Community

README / README.md

jsingh

Update README.md

5b9af82 verified 4 months ago

preview code

raw

history blame contribute delete

3.52 kB

	---
	title: README
	emoji: ⚡
	colorFrom: indigo
	colorTo: red
	sdk: static
	pinned: false
	---

	<h1 align="center"> REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers </h1>

	<p align="center">
	<a href="https://scholar.google.com.au/citations?user=GQzvqS4AAAAJ" target="_blank">Xingjian Leng</a><sup>1*</sup> &ensp; <b>·</b> &ensp;
	<a href="https://1jsingh.github.io/" target="_blank">Jaskirat Singh</a><sup>1*</sup> &ensp; <b>·</b> &ensp;
	<a href="https://hou-yz.github.io/" target="_blank">Yunzhong Hou</a><sup>1</sup> &ensp; <b>·</b> &ensp;
	<a href="https://people.csiro.au/X/Z/Zhenchang-Xing/" target="_blank">Zhenchang Xing</a><sup>2</sup>&ensp; <b>·</b> &ensp;
	<a href="https://www.sainingxie.com/" target="_blank">Saining Xie</a><sup>3</sup>&ensp; <b>·</b> &ensp;
	<a href="https://zheng-lab-anu.github.io/" target="_blank">Liang Zheng</a><sup>1</sup>&ensp;
	</p>

	<p align="center">
	<sup>1</sup> Australian National University &emsp; <sup>2</sup>Data61-CSIRO &emsp; <sup>3</sup>New York University &emsp; <br>
	<sub><sup>*</sup>Equal Contribution&emsp;</sub>
	</p>

	<p align="center">
	<a href="https://End2End-Diffusion.github.io">🌐 Project Page</a> &ensp;
	<a href="https://huggingface.co/REPA-E">🤗 Models</a> &ensp;
	<a href="https://arxiv.org/abs/2504.10483">📃 Paper</a> &ensp;
	<br>
	<!-- <a href="https://paperswithcode.com/sota/image-generation-on-imagenet-256x256?p=repa-e-unlocking-vae-for-end-to-end-tuning-of"><img src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/repa-e-unlocking-vae-for-end-to-end-tuning-of/image-generation-on-imagenet-256x256" alt="PWC"></a> -->
	</p>


	<p align="center">
	<img src="https://github.com/End2End-Diffusion/REPA-E/raw/main/assets/vis-examples.jpg" width="100%" alt="teaser">
	</p>

	---

	We address a fundamental question: *Can latent diffusion models and their VAE tokenizer be trained end-to-end?* While training both components jointly with standard diffusion loss is observed to be ineffective — often degrading final performance — we show that this limitation can be overcome using a simple representation-alignment (REPA) loss. Our proposed method, REPA-E, enables stable and effective joint training of both the VAE and the diffusion model.

	<p align="center">
	<img src="https://github.com/End2End-Diffusion/REPA-E/raw/main/assets/overview.jpg" width="100%" alt="teaser">
	</p>

	REPA-E significantly accelerates training — achieving over 17× speedup compared to REPA and 45× over the vanilla training recipe. Interestingly, end-to-end tuning also improves the VAE itself: the resulting E2E-VAE provides better latent structure and serves as a drop-in replacement for existing VAEs (e.g., SD-VAE), improving convergence and generation quality across diverse LDM architectures. Our method achieves state-of-the-art FID scores on ImageNet 256×256: 1.12 with CFG and 1.69 without CFG.


	## Usage and Training

	Please refer our [Github Repo](https://github.com/End2End-Diffusion/REPA-E) for detailed notes on end-to-end training and inference using REPA-E.

	## 📚 Citation

	```bibtex
	@article{leng2025repae,
	title={REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers},
	author={Xingjian Leng and Jaskirat Singh and Yunzhong Hou and Zhenchang Xing and Saining Xie and Liang Zheng},
	year={2025},
	journal={arXiv preprint arXiv:2504.10483},
	}
	```