File size: 2,456 Bytes
98e94bf
 
 
 
 
44bee05
c30156b
98e94bf
 
 
44bee05
98e94bf
c30156b
 
98e94bf
 
c30156b
 
98e94bf
a21320b
c30156b
98e94bf
 
 
 
 
 
 
 
c30156b
 
 
 
 
 
44bee05
 
 
 
c30156b
 
98e94bf
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: mit
library_name: diffusers
pipeline_tag: image-to-image
---

## EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Arxiv: [https://arxiv.org/abs/2502.09509](https://arxiv.org/abs/2502.09509)
Project Page: [https://eq-vae.github.io/](https://eq-vae.github.io/)
Code: [https://github.com/zelaki/eqvae](https://github.com/zelaki/eqvae)

**EQ-VAE** regularizes the latent space of pretrained autoencoders by enforcing equivariance under scaling and rotation transformations.

---
#### Model Description
This model is a regularized version of [SD-VAE](https://github.com/CompVis/latent-diffusion). We finetune it with EQ-VAE regularization for 5 epochs on OpenImages.

## Model Usage
These weights are intended to be used with the [EQ-VAE codebase](https://github.com/zelaki/eqvae) or the [CompVis Stable Diffusion codebase](https://github.com/CompVis/stable-diffusion).
If you are looking for the model to use with the 🧨 diffusers library, [come here](https://huggingface.co/zelaki/eq-vae).

### Quick Start with 🧨 Diffusers
If you just want to use EQ-VAE to speed up 🚀 the training on your diffusion model, you can use our HuggingFace checkpoints 🤗. We provide two models: [eq-vae](https://huggingface.co/zelaki/eq-vae) and [eq-vae-ema](https://huggingface.co/zelaki/eq-vae-ema).

```python
from diffusers import AutoencoderKL
eqvae = AutoencoderKL.from_pretrained("zelaki/eq-vae")
```
If you are looking for the weights in the original LDM format you can find them here: [eq-vae-ldm](https://huggingface.co/zelaki/eq-vae-ldm), [eq-vae-ema-ldm](https://huggingface.co/zelaki/eq-vae-ema-ldm)

#### Metrics
Reconstruction performance of eq-vae-ema on Imagenet Validation Set.

| **Metric** | **Score** |
|------------|-----------|
| **FID**    | 0.82      |
| **PSNR**   | 25.95     |
| **LPIPS**  | 0.141     |
| **SSIM**   | 0.72      |
---

## Acknowledgement
This code is mainly built upon [LDM](https://github.com/CompVis/latent-diffusion) and [fastDiT](https://github.com/chuanyangjin/fast-DiT).

## Citation
```bibtex
@inproceedings{
  kouzelis2025eqvae,
  title={{EQ}-{VAE}: Equivariance Regularized Latent Space for Improved Generative Image Modeling},
  author={Theodoros Kouzelis and Ioannis Kakogeorgiou and Spyros Gidaris and Nikos Komodakis},
  booktitle={Forty-second International Conference on Machine Learning},
  year={2025},
  url={https://openreview.net/forum?id=UWhW5YYLo6}
}
```