--- license: apache-2.0 pipeline_tag: image-to-image --- # Geometric Autoencoder for Diffusion Models (GAE) Geometric Autoencoder (GAE) is a principled framework designed to systematically address the heuristic nature of latent space design in Latent Diffusion Models (LDMs). GAE significantly enhances semantic discriminability and latent compactness without compromising reconstruction fidelity. - **Paper:** [Geometric Autoencoder for Diffusion Models](https://huggingface.co/papers/2603.10365) - **Code:** [GitHub Repository](https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models) ## Overview GAE introduces three core innovations: 1. **Latent Normalization**: Replaces the restrictive KL-divergence of standard VAEs with **RMSNorm** regularization. By projecting features onto a unit hypersphere, GAE provides a stable, scalable latent manifold optimized for diffusion learning. 2. **Latent Alignment**: Leverages Vision Foundation Models (VFMs, e.g., DINOv2) as semantic teachers. Through a carefully designed semantic downsampler, the low-dimensional latent vectors directly inherit strong discriminative semantic priors. 3. **Dynamic Noise Sampling**: Specifically addresses the high-intensity noise typical in diffusion processes, ensuring robust reconstruction performance even under extreme noise levels. ## Model Zoo | Model | Epochs | Latent Dim | gFID (w/o CFG) | Weights | | :--- | :---: | :---: | :---: | :---: | | **GAE-LightningDiT-XL** | 80 | 32 | 1.82 | [🔗 Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) | | **GAE-LightningDiT-XL** | 800 | 32 | 1.31 | [🔗 Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) | | **GAE** | 200 | 32 | - | [🔗 Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) | ## Usage ### 1. Installation ```bash git clone https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models.git cd GAE conda create -n gae python=3.10.12 conda activate gae pip install -r requirements.txt ``` ### 2. Inference (Sampling) Download the pre-trained weights from Hugging Face and place them in the `checkpoints/` folder. Ensure you update the paths in the `configs/` folder to match your local setup. For class-uniform sampling: ```bash bash inference_gae.sh $DIT_CONFIG $VAE_CONFIG ``` ## Citation ```bibtex @article{liu2026geometric, title={Geometric Autoencoder for Diffusion Models}, author={Hangyu Liu and Jianyong Wang and Yutao Sun}, journal={arXiv preprint arXiv:2603.10365}, year={2026} } ```