SeonghuJeon commited on
Commit
62b9ccb
·
verified ·
1 Parent(s): 000daf9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +20 -30
README.md CHANGED
@@ -11,49 +11,39 @@ tags:
11
 
12
  **Repurposing Geometric Foundation Models for Multi-view Diffusion**
13
 
14
- Wooseok Jang, Seonghu Jeon, Jisang Han, Jinhyeok Choi, Minkyung Kwon, Seungryong Kim, Saining Xie, Sainan Liu
15
-
16
- KAIST, New York University, Intel Labs
17
-
18
  [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
19
 
20
- ## Model Overview
21
-
22
- GLD performs multi-view diffusion in the feature space of geometric foundation models (Depth Anything 3 / VGGT), enabling novel view synthesis with zero-shot geometry.
23
-
24
- ## Checkpoints
25
-
26
- | File | Description | Params |
27
- |------|-------------|--------|
28
- | `checkpoints/da3_level1.pt` | DA3 Level-1 diffusion (EMA) | 783M |
29
- | `checkpoints/da3_cascade.pt` | DA3 Cascade: L1→L0 (EMA) | 473M |
30
- | `checkpoints/vggt_level1.pt` | VGGT Level-1 diffusion (EMA) | 806M |
31
- | `checkpoints/vggt_cascade.pt` | VGGT Cascade: L1→L0 (EMA) | 806M |
32
- | `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (EMA, decoder-only) | 423M |
33
- | `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (EMA, decoder-only) | 425M |
34
- | `pretrained_models/da3/model.safetensors` | DA3-Base encoder weights | 135M |
35
-
36
- All checkpoints contain **EMA weights only** (optimizer/scheduler/discriminator removed).
37
- MAE decoder checkpoints contain **decoder weights only** (encoder weights removed).
38
-
39
- ## Usage
40
 
41
  ```bash
42
  git clone https://github.com/cvlab-kaist/GLD.git
43
  cd GLD
 
 
44
 
45
- # Download checkpoints
46
- # Option 1: huggingface-cli
47
  huggingface-cli download SeonghuJeon/GLD --local-dir .
48
 
49
- # Option 2: Python
50
- from huggingface_hub import snapshot_download
51
- snapshot_download("SeonghuJeon/GLD", local_dir=".")
52
-
53
  # Run demo
54
  ./run_demo.sh da3
55
  ```
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ## Citation
58
 
59
  ```bibtex
 
11
 
12
  **Repurposing Geometric Foundation Models for Multi-view Diffusion**
13
 
 
 
 
 
14
  [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
15
 
16
+ ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ```bash
19
  git clone https://github.com/cvlab-kaist/GLD.git
20
  cd GLD
21
+ conda env create -f environment.yml
22
+ conda activate gld
23
 
24
+ # Download all checkpoints
 
25
  huggingface-cli download SeonghuJeon/GLD --local-dir .
26
 
 
 
 
 
27
  # Run demo
28
  ./run_demo.sh da3
29
  ```
30
 
31
+ ## Files
32
+
33
+ | File | Description | Params | Size |
34
+ |------|-------------|--------|------|
35
+ | `checkpoints/da3_level1.pt` | DA3 Level-1 diffusion (EMA) | 783M | 2.9G |
36
+ | `checkpoints/da3_cascade.pt` | DA3 Cascade: L1→L0 (EMA) | 473M | 1.8G |
37
+ | `checkpoints/vggt_level1.pt` | VGGT Level-1 diffusion (EMA) | 806M | 3.0G |
38
+ | `checkpoints/vggt_cascade.pt` | VGGT Cascade: L1→L0 (EMA) | 806M | 3.0G |
39
+ | `pretrained_models/da3/model.safetensors` | DA3-Base encoder | 135M | 0.5G |
40
+ | `pretrained_models/da3/dpt_decoder.pt` | DPT decoder (depth + geometry) | - | 1.1G |
41
+ | `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (EMA, decoder-only) | 423M | 1.6G |
42
+ | `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (EMA, decoder-only) | 425M | 1.6G |
43
+
44
+ Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
45
+ MAE decoder checkpoints contain **decoder weights only** (encoder removed).
46
+
47
  ## Citation
48
 
49
  ```bibtex