| | --- |
| | license: apache-2.0 |
| | pipeline_tag: image-to-image |
| | --- |
| | |
| | # Geometric Autoencoder for Diffusion Models (GAE) |
| |
|
| | Geometric Autoencoder (GAE) is a principled framework designed to systematically address the heuristic nature of latent space design in Latent Diffusion Models (LDMs). GAE significantly enhances semantic discriminability and latent compactness without compromising reconstruction fidelity. |
| |
|
| | - **Paper:** [Geometric Autoencoder for Diffusion Models](https://huggingface.co/papers/2603.10365) |
| | - **Code:** [GitHub Repository](https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models) |
| |
|
| | ## Overview |
| |
|
| | GAE introduces three core innovations: |
| | 1. **Latent Normalization**: Replaces the restrictive KL-divergence of standard VAEs with **RMSNorm** regularization. By projecting features onto a unit hypersphere, GAE provides a stable, scalable latent manifold optimized for diffusion learning. |
| | 2. **Latent Alignment**: Leverages Vision Foundation Models (VFMs, e.g., DINOv2) as semantic teachers. Through a carefully designed semantic downsampler, the low-dimensional latent vectors directly inherit strong discriminative semantic priors. |
| | 3. **Dynamic Noise Sampling**: Specifically addresses the high-intensity noise typical in diffusion processes, ensuring robust reconstruction performance even under extreme noise levels. |
| |
|
| | ## Model Zoo |
| |
|
| | | Model | Epochs | Latent Dim | gFID (w/o CFG) | Weights | |
| | | :--- | :---: | :---: | :---: | :---: | |
| | | **GAE-LightningDiT-XL** | 80 | 32 | 1.82 | [π Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) | |
| | | **GAE-LightningDiT-XL** | 800 | 32 | 1.31 | [π Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) | |
| | | **GAE** | 200 | 32 | - | [π Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) | |
| |
|
| | ## Usage |
| |
|
| | ### 1. Installation |
| | ```bash |
| | git clone https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models.git |
| | cd GAE |
| | conda create -n gae python=3.10.12 |
| | conda activate gae |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | ### 2. Inference (Sampling) |
| | Download the pre-trained weights from Hugging Face and place them in the `checkpoints/` folder. Ensure you update the paths in the `configs/` folder to match your local setup. |
| |
|
| | For class-uniform sampling: |
| | ```bash |
| | bash inference_gae.sh $DIT_CONFIG $VAE_CONFIG |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{liu2026geometric, |
| | title={Geometric Autoencoder for Diffusion Models}, |
| | author={Hangyu Liu and Jianyong Wang and Yutao Sun}, |
| | journal={arXiv preprint arXiv:2603.10365}, |
| | year={2026} |
| | } |
| | ``` |