xingjianleng commited on
Commit
4e03fd9
·
verified ·
1 Parent(s): bf5f547

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -4
README.md CHANGED
@@ -1,4 +1,101 @@
1
- ---
2
- license: mit
3
- library_name: diffusers
4
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ ---
6
+
7
+ <h1 align="center"> REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers </h1>
8
+
9
+ <p align="center">
10
+ <a href="https://scholar.google.com.au/citations?user=GQzvqS4AAAAJ" target="_blank">Xingjian&nbsp;Leng</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
11
+ <a href="https://1jsingh.github.io/" target="_blank">Jaskirat&nbsp;Singh</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
12
+ <a href="https://hou-yz.github.io/" target="_blank">Yunzhong&nbsp;Hou</a><sup>1</sup> &ensp; <b>&middot;</b> &ensp;
13
+ <a href="https://people.csiro.au/X/Z/Zhenchang-Xing/" target="_blank">Zhenchang&nbsp;Xing</a><sup>2</sup>&ensp; <b>&middot;</b> &ensp;
14
+ <a href="https://www.sainingxie.com/" target="_blank">Saining&nbsp;Xie</a><sup>3</sup>&ensp; <b>&middot;</b> &ensp;
15
+ <a href="https://zheng-lab-anu.github.io/" target="_blank">Liang&nbsp;Zheng</a><sup>1</sup>&ensp;
16
+ </p>
17
+
18
+ <p align="center">
19
+ <sup>1</sup> Australian National University &emsp; <sup>2</sup>Data61-CSIRO &emsp; <sup>3</sup>New York University &emsp; <br>
20
+ <sub><sup>*</sup>Project Leads&emsp;</sub>
21
+ </p>
22
+
23
+ <p align="center">
24
+ <a href="https://End2End-Diffusion.github.io">🌐 Project Page</a> &ensp;
25
+ <a href="https://huggingface.co/REPA-E">🤗 Models</a> &ensp;
26
+ <a href="https://arxiv.org/abs/2504.10483">📃 Paper</a> &ensp;
27
+ <br>
28
+ <!-- <a href="https://paperswithcode.com/sota/image-generation-on-imagenet-256x256?p=repa-e-unlocking-vae-for-end-to-end-tuning-of"><img src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/repa-e-unlocking-vae-for-end-to-end-tuning-of/image-generation-on-imagenet-256x256" alt="PWC"></a> -->
29
+ </p>
30
+
31
+
32
+ <!-- <p align="center">
33
+ <img src="https://github.com/End2End-Diffusion/REPA-E/raw/main/assets/vis-examples.jpg" width="100%" alt="teaser">
34
+ </p> -->
35
+
36
+ ---
37
+
38
+ We address a fundamental question: ***Can latent diffusion models and their VAE tokenizer be trained end-to-end?*** While training both components jointly with standard diffusion loss is observed to be ineffective — often degrading final performance — we show that this limitation can be overcome using a simple representation-alignment (REPA) loss. Our proposed method, **REPA-E**, enables stable and effective joint training of both the VAE and the diffusion model.
39
+
40
+ <p align="center">
41
+ <img src="https://github.com/End2End-Diffusion/REPA-E/raw/main/assets/overview.jpg" width="100%" alt="teaser">
42
+ </p>
43
+
44
+ **REPA-E** significantly accelerates training — achieving over **17×** speedup compared to REPA and **45×** over the vanilla training recipe. Interestingly, end-to-end tuning also improves the VAE itself: the resulting **E2E-VAE** provides better latent structure and serves as a **drop-in replacement** for existing VAEs (e.g., SD-VAE), improving convergence and generation quality across diverse LDM architectures. Our method achieves state-of-the-art FID scores on ImageNet 256×256: **1.26** with CFG and **1.83** without CFG.
45
+
46
+
47
+ <h1 align="left" style="color:#ff000d">🆕 AutoencoderKL-Compatible Release</h1>
48
+
49
+ > **New in this release:** We are releasing the **REPA-E E2E-VAE** as a fully **Hugging Face AutoencoderKL** checkpoint — ready to use with `diffusers` out of the box.
50
+
51
+ We previously released the REPA-E VAE checkpoint, which required loading through the model class in our REPA-E repository.
52
+ This new version provides a **Hugging Face–compatible AutoencoderKL** checkpoint that can be loaded directly via the `diffusers` API — no extra code or custom wrapper needed.
53
+
54
+ It offers **plug-and-play compatibility** with diffusion pipelines and can be seamlessly used to build or train new diffusion models.
55
+
56
+ ## 📦 Requirements
57
+ ```bash
58
+ pip install diffusers>=0.33.0
59
+ pip install torch>=2.3.1
60
+ ```
61
+
62
+ ## 🚀 Example Usage
63
+ ```python
64
+ from io import BytesIO
65
+ import requests
66
+
67
+ from diffusers import AutoencoderKL
68
+ import numpy as np
69
+ import torch
70
+ from PIL import Image
71
+
72
+
73
+ response = requests.get("https://s3.amazonaws.com/masters.galleries.prod.dpreview.com/2935392.jpg?X-Amz-Expires=3600&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUIXIAMA3N436PSEA/20251019/us-east-1/s3/aws4_request&X-Amz-Date=20251019T103721Z&X-Amz-SignedHeaders=host&X-Amz-Signature=219dc5f98e5c2e5f3b72587716f75889b8f45b0a01f1bd08dbbc44106e484144")
74
+ device = "cuda"
75
+
76
+ image = torch.from_numpy(
77
+ np.array(
78
+ Image.open(BytesIO(response.content)).resize((512, 512))
79
+ )
80
+ ).permute(2, 0, 1).unsqueeze(0).to(torch.float32) / 127.5 - 1
81
+ image = image.to(device)
82
+
83
+ vae = AutoencoderKL.from_pretrained("REPA-E/e2e-sdvae-hf").to(device)
84
+
85
+ with torch.no_grad():
86
+ latents = vae.encode(image).latent_dist.sample()
87
+ reconstructed = vae.decode(latents).sample
88
+
89
+ ```
90
+
91
+
92
+ ## 📚 Citation
93
+
94
+ ```bibtex
95
+ @article{leng2025repae,
96
+ title={REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers},
97
+ author={Xingjian Leng and Jaskirat Singh and Yunzhong Hou and Zhenchang Xing and Saining Xie and Liang Zheng},
98
+ year={2025},
99
+ journal={arXiv preprint arXiv:2504.10483},
100
+ }
101
+ ```