GAE-Checkpoints / README.md

nielsr HF Staff

Improve model card: add paper link, code link, and metadata

bd13075 verified 5 days ago

2.6 kB

license: apache-2.0
pipeline_tag: image-to-image

Geometric Autoencoder for Diffusion Models (GAE)

Geometric Autoencoder (GAE) is a principled framework designed to systematically address the heuristic nature of latent space design in Latent Diffusion Models (LDMs). GAE significantly enhances semantic discriminability and latent compactness without compromising reconstruction fidelity.

Paper: Geometric Autoencoder for Diffusion Models
Code: GitHub Repository

Overview

GAE introduces three core innovations:

Latent Normalization: Replaces the restrictive KL-divergence of standard VAEs with RMSNorm regularization. By projecting features onto a unit hypersphere, GAE provides a stable, scalable latent manifold optimized for diffusion learning.
Latent Alignment: Leverages Vision Foundation Models (VFMs, e.g., DINOv2) as semantic teachers. Through a carefully designed semantic downsampler, the low-dimensional latent vectors directly inherit strong discriminative semantic priors.
Dynamic Noise Sampling: Specifically addresses the high-intensity noise typical in diffusion processes, ensuring robust reconstruction performance even under extreme noise levels.

Model Zoo

Model	Epochs	Latent Dim	gFID (w/o CFG)	Weights
GAE-LightningDiT-XL	80	32	1.82	🔗 Checkpoints
GAE-LightningDiT-XL	800	32	1.31	🔗 Checkpoints
GAE	200	32	-	🔗 Checkpoints

Usage

1. Installation

git clone https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models.git
cd GAE
conda create -n gae python=3.10.12
conda activate gae
pip install -r requirements.txt

2. Inference (Sampling)

Download the pre-trained weights from Hugging Face and place them in the checkpoints/ folder. Ensure you update the paths in the configs/ folder to match your local setup.

For class-uniform sampling:

bash inference_gae.sh $DIT_CONFIG $VAE_CONFIG

Citation

@article{liu2026geometric,
  title={Geometric Autoencoder for Diffusion Models},
  author={Hangyu Liu and Jianyong Wang and Yutao Sun},
  journal={arXiv preprint arXiv:2603.10365},
  year={2026}
}