SeonghuJeon commited on
Commit
22ec96d
·
verified ·
1 Parent(s): d71cc25

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - novel-view-synthesis
5
+ - multi-view-diffusion
6
+ - depth-estimation
7
+ - 3d-reconstruction
8
+ ---
9
+
10
+ # GLD: Geometric Latent Diffusion
11
+
12
+ **Repurposing Geometric Foundation Models for Multi-view Diffusion**
13
+
14
+ Wooseok Jang, Seonghu Jeon, Jisang Han, Jinhyeok Choi, Minkyung Kwon, Seungryong Kim, Saining Xie, Sainan Liu
15
+
16
+ KAIST, New York University, Intel Labs
17
+
18
+ [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
19
+
20
+ ## Model Overview
21
+
22
+ GLD performs multi-view diffusion in the feature space of geometric foundation models (Depth Anything 3 / VGGT), enabling novel view synthesis with zero-shot geometry.
23
+
24
+ ## Checkpoints
25
+
26
+ | File | Description | Params |
27
+ |------|-------------|--------|
28
+ | `checkpoints/da3_level1.pt` | DA3 Level-1 diffusion (EMA) | 783M |
29
+ | `checkpoints/da3_cascade.pt` | DA3 Cascade: L1→L0 (EMA) | 473M |
30
+ | `checkpoints/vggt_level1.pt` | VGGT Level-1 diffusion (EMA) | 806M |
31
+ | `checkpoints/vggt_cascade.pt` | VGGT Cascade: L1→L0 (EMA) | 806M |
32
+ | `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (EMA, decoder-only) | 423M |
33
+ | `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (EMA, decoder-only) | 425M |
34
+ | `pretrained_models/da3/model.safetensors` | DA3-Base encoder weights | 135M |
35
+
36
+ All checkpoints contain **EMA weights only** (optimizer/scheduler/discriminator removed).
37
+ MAE decoder checkpoints contain **decoder weights only** (encoder weights removed).
38
+
39
+ ## Usage
40
+
41
+ ```bash
42
+ git clone https://github.com/cvlab-kaist/GLD.git
43
+ cd GLD
44
+
45
+ # Download checkpoints
46
+ # Option 1: huggingface-cli
47
+ huggingface-cli download SeonghuJeon/GLD --local-dir .
48
+
49
+ # Option 2: Python
50
+ from huggingface_hub import snapshot_download
51
+ snapshot_download("SeonghuJeon/GLD", local_dir=".")
52
+
53
+ # Run demo
54
+ ./run_demo.sh da3
55
+ ```
56
+
57
+ ## Citation
58
+
59
+ ```bibtex
60
+ @article{jang2026gld,
61
+ title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
62
+ author={Jang, Wooseok and Jeon, Seonghu and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
63
+ journal={arXiv preprint},
64
+ year={2026}
65
+ }
66
+ ```