nielsr HF Staff commited on
Commit
bd13075
·
verified ·
1 Parent(s): 18974ef

Improve model card: add paper link, code link, and metadata

Browse files

Hi! I'm Niels from the Hugging Face community science team. This PR aims to improve the model card for GAE by adding:
- A link to the paper: [Geometric Autoencoder for Diffusion Models](https://huggingface.co/papers/2603.10365).
- A link to the official code repository.
- Relevant metadata including the `pipeline_tag`.
- A Model Zoo overview and usage instructions.

This will help researchers find, use, and cite your work more easily on the Hub.

Files changed (1) hide show
  1. README.md +56 -3
README.md CHANGED
@@ -1,3 +1,56 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-image
4
+ ---
5
+
6
+ # Geometric Autoencoder for Diffusion Models (GAE)
7
+
8
+ Geometric Autoencoder (GAE) is a principled framework designed to systematically address the heuristic nature of latent space design in Latent Diffusion Models (LDMs). GAE significantly enhances semantic discriminability and latent compactness without compromising reconstruction fidelity.
9
+
10
+ - **Paper:** [Geometric Autoencoder for Diffusion Models](https://huggingface.co/papers/2603.10365)
11
+ - **Code:** [GitHub Repository](https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models)
12
+
13
+ ## Overview
14
+
15
+ GAE introduces three core innovations:
16
+ 1. **Latent Normalization**: Replaces the restrictive KL-divergence of standard VAEs with **RMSNorm** regularization. By projecting features onto a unit hypersphere, GAE provides a stable, scalable latent manifold optimized for diffusion learning.
17
+ 2. **Latent Alignment**: Leverages Vision Foundation Models (VFMs, e.g., DINOv2) as semantic teachers. Through a carefully designed semantic downsampler, the low-dimensional latent vectors directly inherit strong discriminative semantic priors.
18
+ 3. **Dynamic Noise Sampling**: Specifically addresses the high-intensity noise typical in diffusion processes, ensuring robust reconstruction performance even under extreme noise levels.
19
+
20
+ ## Model Zoo
21
+
22
+ | Model | Epochs | Latent Dim | gFID (w/o CFG) | Weights |
23
+ | :--- | :---: | :---: | :---: | :---: |
24
+ | **GAE-LightningDiT-XL** | 80 | 32 | 1.82 | [🔗 Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) |
25
+ | **GAE-LightningDiT-XL** | 800 | 32 | 1.31 | [🔗 Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) |
26
+ | **GAE** | 200 | 32 | - | [🔗 Checkpoints](https://huggingface.co/GK50/GAE-Checkpoints/tree/main/checkpoints/d32) |
27
+
28
+ ## Usage
29
+
30
+ ### 1. Installation
31
+ ```bash
32
+ git clone https://github.com/freezing-index/Geometric-Autoencoder-for-Diffusion-Models.git
33
+ cd GAE
34
+ conda create -n gae python=3.10.12
35
+ conda activate gae
36
+ pip install -r requirements.txt
37
+ ```
38
+
39
+ ### 2. Inference (Sampling)
40
+ Download the pre-trained weights from Hugging Face and place them in the `checkpoints/` folder. Ensure you update the paths in the `configs/` folder to match your local setup.
41
+
42
+ For class-uniform sampling:
43
+ ```bash
44
+ bash inference_gae.sh $DIT_CONFIG $VAE_CONFIG
45
+ ```
46
+
47
+ ## Citation
48
+
49
+ ```bibtex
50
+ @article{liu2026geometric,
51
+ title={Geometric Autoencoder for Diffusion Models},
52
+ author={Hangyu Liu and Jianyong Wang and Yutao Sun},
53
+ journal={arXiv preprint arXiv:2603.10365},
54
+ year={2026}
55
+ }
56
+ ```