|
|
--- |
|
|
license: mit |
|
|
library_name: diffusers |
|
|
pipeline_tag: image-to-image |
|
|
--- |
|
|
|
|
|
## EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling |
|
|
Arxiv: [https://arxiv.org/abs/2502.09509](https://arxiv.org/abs/2502.09509) |
|
|
Project Page: [https://eq-vae.github.io/](https://eq-vae.github.io/) |
|
|
Code: [https://github.com/zelaki/eqvae](https://github.com/zelaki/eqvae) |
|
|
|
|
|
**EQ-VAE** regularizes the latent space of pretrained autoencoders by enforcing equivariance under scaling and rotation transformations. |
|
|
|
|
|
--- |
|
|
#### Model Description |
|
|
This model is a regularized version of [SD-VAE](https://github.com/CompVis/latent-diffusion). We finetune it with EQ-VAE regularization for 5 epochs on OpenImages. |
|
|
|
|
|
## Model Usage |
|
|
These weights are intended to be used with the [EQ-VAE codebase](https://github.com/zelaki/eqvae) or the [CompVis Stable Diffusion codebase](https://github.com/CompVis/stable-diffusion). |
|
|
If you are looking for the model to use with the 🧨 diffusers library, [come here](https://huggingface.co/zelaki/eq-vae). |
|
|
|
|
|
### Quick Start with 🧨 Diffusers |
|
|
If you just want to use EQ-VAE to speed up 🚀 the training on your diffusion model, you can use our HuggingFace checkpoints 🤗. We provide two models: [eq-vae](https://huggingface.co/zelaki/eq-vae) and [eq-vae-ema](https://huggingface.co/zelaki/eq-vae-ema). |
|
|
|
|
|
```python |
|
|
from diffusers import AutoencoderKL |
|
|
eqvae = AutoencoderKL.from_pretrained("zelaki/eq-vae") |
|
|
``` |
|
|
If you are looking for the weights in the original LDM format you can find them here: [eq-vae-ldm](https://huggingface.co/zelaki/eq-vae-ldm), [eq-vae-ema-ldm](https://huggingface.co/zelaki/eq-vae-ema-ldm) |
|
|
|
|
|
#### Metrics |
|
|
Reconstruction performance of eq-vae-ema on Imagenet Validation Set. |
|
|
|
|
|
| **Metric** | **Score** | |
|
|
|------------|-----------| |
|
|
| **FID** | 0.82 | |
|
|
| **PSNR** | 25.95 | |
|
|
| **LPIPS** | 0.141 | |
|
|
| **SSIM** | 0.72 | |
|
|
--- |
|
|
|
|
|
## Acknowledgement |
|
|
This code is mainly built upon [LDM](https://github.com/CompVis/latent-diffusion) and [fastDiT](https://github.com/chuanyangjin/fast-DiT). |
|
|
|
|
|
## Citation |
|
|
```bibtex |
|
|
@inproceedings{ |
|
|
kouzelis2025eqvae, |
|
|
title={{EQ}-{VAE}: Equivariance Regularized Latent Space for Improved Generative Image Modeling}, |
|
|
author={Theodoros Kouzelis and Ioannis Kakogeorgiou and Spyros Gidaris and Nikos Komodakis}, |
|
|
booktitle={Forty-second International Conference on Machine Learning}, |
|
|
year={2025}, |
|
|
url={https://openreview.net/forum?id=UWhW5YYLo6} |
|
|
} |
|
|
``` |