metadata
license: apache-2.0
pipeline_tag: image-to-image
Geometric Autoencoder for Diffusion Models (GAE)
Geometric Autoencoder (GAE) is a principled framework designed to systematically address the heuristic nature of latent space design in Latent Diffusion Models (LDMs). GAE significantly enhances semantic discriminability and latent compactness without compromising reconstruction fidelity.
Overview
GAE introduces three core innovations:
- Latent Normalization: Replaces the restrictive KL-divergence of standard VAEs with RMSNorm regularization. By projecting features onto a unit hypersphere, GAE provides a stable, scalable latent manifold optimized for diffusion learning.
- Latent Alignment: Leverages Vision Foundation Models (VFMs, e.g., DINOv2) as semantic teachers. Through a carefully designed semantic downsampler, the low-dimensional latent vectors directly inherit strong discriminative semantic priors.
- Dynamic Noise Sampling: Specifically addresses the high-intensity noise typical in diffusion processes, ensuring robust reconstruction performance even under extreme noise levels.
Model Zoo
| Model | Epochs | Latent Dim | gFID (w/o CFG) | Weights |
|---|---|---|---|---|
| GAE-LightningDiT-XL | 80 | 32 | 1.82 | ๐ Checkpoints |
| GAE-LightningDiT-XL | 800 | 32 | 1.31 | ๐ Checkpoints |
| GAE | 200 | 32 | - | ๐ Checkpoints |
Usage
1. Installation
git clone https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models.git
cd GAE
conda create -n gae python=3.10.12
conda activate gae
pip install -r requirements.txt
2. Inference (Sampling)
Download the pre-trained weights from Hugging Face and place them in the checkpoints/ folder. Ensure you update the paths in the configs/ folder to match your local setup.
For class-uniform sampling:
bash inference_gae.sh $DIT_CONFIG $VAE_CONFIG
Citation
@article{liu2026geometric,
title={Geometric Autoencoder for Diffusion Models},
author={Hangyu Liu and Jianyong Wang and Yutao Sun},
journal={arXiv preprint arXiv:2603.10365},
year={2026}
}