Improve model card: add paper link, code link, and metadata
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,3 +1,56 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-to-image
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Geometric Autoencoder for Diffusion Models (GAE)
|
| 7 |
+
|
| 8 |
+
Geometric Autoencoder (GAE) is a principled framework designed to systematically address the heuristic nature of latent space design in Latent Diffusion Models (LDMs). GAE significantly enhances semantic discriminability and latent compactness without compromising reconstruction fidelity.
|
| 9 |
+
|
| 10 |
+
- **Paper:** [Geometric Autoencoder for Diffusion Models](https://huggingface.co/papers/2603.10365)
|
| 11 |
+
- **Code:** [GitHub Repository](https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models)
|
| 12 |
+
|
| 13 |
+
## Overview
|
| 14 |
+
|
| 15 |
+
GAE introduces three core innovations:
|
| 16 |
+
1. **Latent Normalization**: Replaces the restrictive KL-divergence of standard VAEs with **RMSNorm** regularization. By projecting features onto a unit hypersphere, GAE provides a stable, scalable latent manifold optimized for diffusion learning.
|
| 17 |
+
2. **Latent Alignment**: Leverages Vision Foundation Models (VFMs, e.g., DINOv2) as semantic teachers. Through a carefully designed semantic downsampler, the low-dimensional latent vectors directly inherit strong discriminative semantic priors.
|
| 18 |
+
3. **Dynamic Noise Sampling**: Specifically addresses the high-intensity noise typical in diffusion processes, ensuring robust reconstruction performance even under extreme noise levels.
|
| 19 |
+
|
| 20 |
+
## Model Zoo
|
| 21 |
+
|
| 22 |
+
| Model | Epochs | Latent Dim | gFID (w/o CFG) | Weights |
|
| 23 |
+
| :--- | :---: | :---: | :---: | :---: |
|
| 24 |
+
| **GAE-LightningDiT-XL** | 80 | 32 | 1.82 | [๐ Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) |
|
| 25 |
+
| **GAE-LightningDiT-XL** | 800 | 32 | 1.31 | [๐ Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) |
|
| 26 |
+
| **GAE** | 200 | 32 | - | [๐ Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) |
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
### 1. Installation
|
| 31 |
+
```bash
|
| 32 |
+
git clone https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models.git
|
| 33 |
+
cd GAE
|
| 34 |
+
conda create -n gae python=3.10.12
|
| 35 |
+
conda activate gae
|
| 36 |
+
pip install -r requirements.txt
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
### 2. Inference (Sampling)
|
| 40 |
+
Download the pre-trained weights from Hugging Face and place them in the `checkpoints/` folder. Ensure you update the paths in the `configs/` folder to match your local setup.
|
| 41 |
+
|
| 42 |
+
For class-uniform sampling:
|
| 43 |
+
```bash
|
| 44 |
+
bash inference_gae.sh $DIT_CONFIG $VAE_CONFIG
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## Citation
|
| 48 |
+
|
| 49 |
+
```bibtex
|
| 50 |
+
@article{liu2026geometric,
|
| 51 |
+
title={Geometric Autoencoder for Diffusion Models},
|
| 52 |
+
author={Hangyu Liu and Jianyong Wang and Yutao Sun},
|
| 53 |
+
journal={arXiv preprint arXiv:2603.10365},
|
| 54 |
+
year={2026}
|
| 55 |
+
}
|
| 56 |
+
```
|