EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper β’ 2502.09509 β’ Published β’ 9
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("zelaki/eq-vae-ema", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]Arxiv: https://arxiv.org/abs/2502.09509
EQ-VAE regularizes the latent space of pretrained autoencoders by enforcing equivariance under scaling and rotation transformations.
This model is a regularized version of SD-VAE. We finetune it with EQ-VAE regularization for 44 epochs on Imagenet with EMA weights.
from transformers import AutoencoderKL
model = AutoencoderKL.from_pretrained("zelaki/eq-vae-ema")
Reconstruction performance of eq-vae-ema on Imagenet Validation Set.
| Metric | Score |
|---|---|
| FID | 0.552 |
| PSNR | 26.158 |
| LPIPS | 0.133 |
| SSIM | 0.725 |