Oathkeepers commited on
Commit
f4017ac
Β·
1 Parent(s): 18974ef

Update README

Browse files
Files changed (1) hide show
  1. README.md +95 -1
README.md CHANGED
@@ -1,3 +1,97 @@
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- license: apache-2.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Geometric Autoencoder for Diffusion Models (GAE)
2
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/sii-research/gae-imagenet256-f16d32/tree/main)
3
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
4
+ [![Paper](https://img.shields.io/badge/Paper-PDF-red)](http://arxiv.org/abs/2603.10365)
5
+
6
+
7
+ ## πŸ“„ Abstract
8
+
9
+ Latent diffusion models have established a new state-of-the-art in high-resolution visual generation. Integrating Vision Foundation Model priors improves generative efficiency, yet existing latent designs remain largely heuristic. These approaches often struggle to unify semantic discriminability, reconstruction fidelity, and latent compactness. In this paper, we propose Geometric Autoencoder (GAE), a principled framework that systematically addresses these challenges. By analyzing various alignment paradigms, GAE constructs an optimized low-dimensional semantic supervision target from VFMs to provide guidance for the autoencoder. Furthermore, we leverage latent normalization that replaces the restrictive KL-divergence of standard VAEs, enabling a more stable latent manifold specifically optimized for diffusion learning. To ensure robust reconstruction under high-intensity noise, GAE incorporates a dynamic noise sampling mechanism. Empirically, GAE achieves compelling performance on the ImageNet-1K $256 \times 256$ benchmark, reaching a gFID of 1.82 at only 80 epochs and 1.31 at 800 epochs without Classifier-Free Guidance, significantly surpassing existing state-of-the-art methods. Beyond generative quality, GAE establishes a superior equilibrium between compression, semantic depth and robust reconstruction stability. These results validate our design considerations, offering a promising paradigm for latent diffusion modeling.
10
+
11
  ---
12
+
13
+ ## πŸ“’ News
14
+ * **[2026.03.10]**: Core code released! Includes DiT training and inference based on GAE latent space.
15
+ * **[2026.03.10]**: Pre-trained weights for GAE-AE and DiT are available on [Hugging Face](https://huggingface.co/sii-research/gae-imagenet256-f16d32/tree/main).
16
+
17
  ---
18
+ ## ✨ Highlights
19
+ * Reaching a gFID of 1.82 at only 80 epochs and 1.31 at 800 epochs without Classifier-Free Guidance.
20
+ * Reaching a gFID of 1.48 at only 80 epochs and 1.13 at 800 epochs with Classifier-Free Guidance.
21
+
22
+ ---
23
+
24
+ ## πŸ“¦ Model Zoo & Weights
25
+
26
+ Some pre-trained weights are hosted on Hugging Face.
27
+
28
+ | Model | Epochs | Latent Dim | gFID (w/o) | Weights |
29
+ | :--- | :---: | :---: | :---: | :---: |
30
+ | **GAE-LightningDiT-XL** | 80 | 32 | 1.82 | [πŸ”— HF Link](https://huggingface.co/sii-research/gae-imagenet256-f16d32/tree/main/d32) |
31
+ | **GAE-LightningDiT-XL** | 800 | 32 | 1.31 | [πŸ”— HF Link](https://huggingface.co/sii-research/gae-imagenet256-f16d32/tree/main/d32) |
32
+
33
+ ---
34
+
35
+ | Model | Epochs | Latent Dim | Weights |
36
+ | :--- | :---: | :---: | :---: |
37
+ | **GAE** | 200 | 32 | [πŸ”— HF Link](https://huggingface.co/sii-research/gae-imagenet256-f16d32/tree/main/d32) |
38
+
39
+ ---
40
+
41
+ ## πŸ› οΈ Usage
42
+
43
+ We use [LightningDiT](https://github.com/hustvl/LightningDiT) for the DiT implementation.
44
+
45
+ ### 1. Installation
46
+ ```bash
47
+ git clone https://github.com/sii-research/GAE.git
48
+ cd GAE
49
+ conda create -n gae python=3.10.12
50
+ conda activate gae
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ ### 2. Extract Latents
55
+ Download the pre-trained weights from Hugging Face and place them in the checkpoints/ folder. Ensure update the paths in the configs/ folder to match your local setup.
56
+ ```bash
57
+ bash extract_gae.sh $DIT_CONFIG $VAE_CONFIG
58
+ ```
59
+ ### 3. Training
60
+ ```bash
61
+ bash train_gae.sh $DIT_CONFIG $VAE_CONFIG
62
+ ```
63
+
64
+
65
+ ### 4. Inference (Sampling)
66
+ For class-uniform sampling:
67
+ ```bash
68
+ bash inference_gae.sh $DIT_CONFIG $VAE_CONFIG
69
+ ```
70
+ For class-random sampling:
71
+ ```bash
72
+ Change "from inference_sample import" -> "from inference import" in inference_gae.py
73
+ ```
74
+ ## 🀝 Acknowledgements
75
+ Our project is built upon the excellent foundations of the following open-source projects:
76
+
77
+ * [LightningDiT](https://github.com/hustvl/LightningDiT): For the PyTorch Lightning based DiT implementation.
78
+
79
+ * [RAE](https://github.com/bytetriper/RAE): For the timeshift and class-uniform sampling implementation.
80
+
81
+ * [ADM](https://github.com/openai/guided-diffusion): For the evaluation suite to score generated samples.
82
+
83
+ We express our sincere gratitude to the authors for their valuable contributions to the community.
84
+
85
+ ## πŸ“ Citation
86
+ If you find this work useful, please consider citing:
87
+ ```bibtex
88
+ @misc{liu2026geometricautoencoderdiffusionmodels,
89
+ title={Geometric Autoencoder for Diffusion Models},
90
+ author={Hangyu Liu and Jianyong Wang and Yutao Sun},
91
+ year={2026},
92
+ eprint={2603.10365},
93
+ archivePrefix={arXiv},
94
+ primaryClass={cs.CV},
95
+ url={https://arxiv.org/abs/2603.10365},
96
+ }
97
+ ```