GAE-Checkpoints / README.md
nielsr's picture
nielsr HF Staff
Improve model card: add paper link, code link, and metadata
bd13075 verified
|
raw
history blame
2.6 kB
metadata
license: apache-2.0
pipeline_tag: image-to-image

Geometric Autoencoder for Diffusion Models (GAE)

Geometric Autoencoder (GAE) is a principled framework designed to systematically address the heuristic nature of latent space design in Latent Diffusion Models (LDMs). GAE significantly enhances semantic discriminability and latent compactness without compromising reconstruction fidelity.

Overview

GAE introduces three core innovations:

  1. Latent Normalization: Replaces the restrictive KL-divergence of standard VAEs with RMSNorm regularization. By projecting features onto a unit hypersphere, GAE provides a stable, scalable latent manifold optimized for diffusion learning.
  2. Latent Alignment: Leverages Vision Foundation Models (VFMs, e.g., DINOv2) as semantic teachers. Through a carefully designed semantic downsampler, the low-dimensional latent vectors directly inherit strong discriminative semantic priors.
  3. Dynamic Noise Sampling: Specifically addresses the high-intensity noise typical in diffusion processes, ensuring robust reconstruction performance even under extreme noise levels.

Model Zoo

Model Epochs Latent Dim gFID (w/o CFG) Weights
GAE-LightningDiT-XL 80 32 1.82 ๐Ÿ”— Checkpoints
GAE-LightningDiT-XL 800 32 1.31 ๐Ÿ”— Checkpoints
GAE 200 32 - ๐Ÿ”— Checkpoints

Usage

1. Installation

git clone https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models.git
cd GAE
conda create -n gae python=3.10.12
conda activate gae
pip install -r requirements.txt

2. Inference (Sampling)

Download the pre-trained weights from Hugging Face and place them in the checkpoints/ folder. Ensure you update the paths in the configs/ folder to match your local setup.

For class-uniform sampling:

bash inference_gae.sh $DIT_CONFIG $VAE_CONFIG

Citation

@article{liu2026geometric,
  title={Geometric Autoencoder for Diffusion Models},
  author={Hangyu Liu and Jianyong Wang and Yutao Sun},
  journal={arXiv preprint arXiv:2603.10365},
  year={2026}
}