SeonghuJeon
/

GLD

@@ -1,5 +1,6 @@
 ---
 license: apache-2.0
 tags:
 - novel-view-synthesis
 - multi-view-diffusion
@@ -11,10 +12,14 @@ tags:
 **Repurposing Geometric Foundation Models for Multi-view Diffusion**
-[[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
 ## Quick Start
 ```bash
 git clone https://github.com/cvlab-kaist/GLD.git
 cd GLD
@@ -43,3 +48,18 @@ python -c "from huggingface_hub import snapshot_download; snapshot_download('Seo
 Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
 MAE decoder checkpoints contain **decoder weights only** (encoder removed).

 ---
 license: apache-2.0
+pipeline_tag: image-to-3d
 tags:
 - novel-view-synthesis
 - multi-view-diffusion
 **Repurposing Geometric Foundation Models for Multi-view Diffusion**
+[[Paper]](https://huggingface.co/papers/2603.22275) | [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
+Geometric Latent Diffusion (GLD) is a framework that repurposes the geometrically consistent feature space of geometric foundation models (such as Depth Anything 3 and VGGT) as the latent space for multi-view diffusion. By operating in this space rather than a view-independent VAE latent space, GLD achieves consistent novel view synthesis (NVS) and 3D reconstruction with significantly faster training convergence.
 ## Quick Start
+To use these models, follow the setup instructions in the [official GitHub repository](https://github.com/cvlab-kaist/GLD).
 ```bash
 git clone https://github.com/cvlab-kaist/GLD.git
 cd GLD
 Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
 MAE decoder checkpoints contain **decoder weights only** (encoder removed).
+## Citation
+```bibtex
+@article{jang2026gld,
+  title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
+  author={Jang, Wooseok and Jeon, Seonghu  and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
+  journal={arXiv preprint arXiv:2603.22275},
+  year={2026}
+}
+```
+## Acknowledgements
+Built upon [RAE](https://github.com/nicknign/RAE_release), [Depth Anything 3](https://github.com/DepthAnything/Depth-Anything-3), [VGGT](https://github.com/facebookresearch/vggt), [CUT3R](https://github.com/naver/CUT3R), and [SiT](https://github.com/willisma/SiT).