Add pipeline tag and paper link
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,5 +1,6 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
tags:
|
| 4 |
- novel-view-synthesis
|
| 5 |
- multi-view-diffusion
|
|
@@ -11,10 +12,14 @@ tags:
|
|
| 11 |
|
| 12 |
**Repurposing Geometric Foundation Models for Multi-view Diffusion**
|
| 13 |
|
| 14 |
-
[[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
|
|
|
|
|
|
|
| 15 |
|
| 16 |
## Quick Start
|
| 17 |
|
|
|
|
|
|
|
| 18 |
```bash
|
| 19 |
git clone https://github.com/cvlab-kaist/GLD.git
|
| 20 |
cd GLD
|
|
@@ -43,3 +48,18 @@ python -c "from huggingface_hub import snapshot_download; snapshot_download('Seo
|
|
| 43 |
|
| 44 |
Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
|
| 45 |
MAE decoder checkpoints contain **decoder weights only** (encoder removed).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-to-3d
|
| 4 |
tags:
|
| 5 |
- novel-view-synthesis
|
| 6 |
- multi-view-diffusion
|
|
|
|
| 12 |
|
| 13 |
**Repurposing Geometric Foundation Models for Multi-view Diffusion**
|
| 14 |
|
| 15 |
+
[[Paper]](https://huggingface.co/papers/2603.22275) | [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
|
| 16 |
+
|
| 17 |
+
Geometric Latent Diffusion (GLD) is a framework that repurposes the geometrically consistent feature space of geometric foundation models (such as Depth Anything 3 and VGGT) as the latent space for multi-view diffusion. By operating in this space rather than a view-independent VAE latent space, GLD achieves consistent novel view synthesis (NVS) and 3D reconstruction with significantly faster training convergence.
|
| 18 |
|
| 19 |
## Quick Start
|
| 20 |
|
| 21 |
+
To use these models, follow the setup instructions in the [official GitHub repository](https://github.com/cvlab-kaist/GLD).
|
| 22 |
+
|
| 23 |
```bash
|
| 24 |
git clone https://github.com/cvlab-kaist/GLD.git
|
| 25 |
cd GLD
|
|
|
|
| 48 |
|
| 49 |
Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
|
| 50 |
MAE decoder checkpoints contain **decoder weights only** (encoder removed).
|
| 51 |
+
|
| 52 |
+
## Citation
|
| 53 |
+
|
| 54 |
+
```bibtex
|
| 55 |
+
@article{jang2026gld,
|
| 56 |
+
title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
|
| 57 |
+
author={Jang, Wooseok and Jeon, Seonghu and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
|
| 58 |
+
journal={arXiv preprint arXiv:2603.22275},
|
| 59 |
+
year={2026}
|
| 60 |
+
}
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
## Acknowledgements
|
| 64 |
+
|
| 65 |
+
Built upon [RAE](https://github.com/nicknign/RAE_release), [Depth Anything 3](https://github.com/DepthAnything/Depth-Anything-3), [VGGT](https://github.com/facebookresearch/vggt), [CUT3R](https://github.com/naver/CUT3R), and [SiT](https://github.com/willisma/SiT).
|