Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -11,49 +11,39 @@ tags:
|
|
| 11 |
|
| 12 |
**Repurposing Geometric Foundation Models for Multi-view Diffusion**
|
| 13 |
|
| 14 |
-
Wooseok Jang, Seonghu Jeon, Jisang Han, Jinhyeok Choi, Minkyung Kwon, Seungryong Kim, Saining Xie, Sainan Liu
|
| 15 |
-
|
| 16 |
-
KAIST, New York University, Intel Labs
|
| 17 |
-
|
| 18 |
[[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
|
| 19 |
|
| 20 |
-
##
|
| 21 |
-
|
| 22 |
-
GLD performs multi-view diffusion in the feature space of geometric foundation models (Depth Anything 3 / VGGT), enabling novel view synthesis with zero-shot geometry.
|
| 23 |
-
|
| 24 |
-
## Checkpoints
|
| 25 |
-
|
| 26 |
-
| File | Description | Params |
|
| 27 |
-
|------|-------------|--------|
|
| 28 |
-
| `checkpoints/da3_level1.pt` | DA3 Level-1 diffusion (EMA) | 783M |
|
| 29 |
-
| `checkpoints/da3_cascade.pt` | DA3 Cascade: L1→L0 (EMA) | 473M |
|
| 30 |
-
| `checkpoints/vggt_level1.pt` | VGGT Level-1 diffusion (EMA) | 806M |
|
| 31 |
-
| `checkpoints/vggt_cascade.pt` | VGGT Cascade: L1→L0 (EMA) | 806M |
|
| 32 |
-
| `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (EMA, decoder-only) | 423M |
|
| 33 |
-
| `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (EMA, decoder-only) | 425M |
|
| 34 |
-
| `pretrained_models/da3/model.safetensors` | DA3-Base encoder weights | 135M |
|
| 35 |
-
|
| 36 |
-
All checkpoints contain **EMA weights only** (optimizer/scheduler/discriminator removed).
|
| 37 |
-
MAE decoder checkpoints contain **decoder weights only** (encoder weights removed).
|
| 38 |
-
|
| 39 |
-
## Usage
|
| 40 |
|
| 41 |
```bash
|
| 42 |
git clone https://github.com/cvlab-kaist/GLD.git
|
| 43 |
cd GLD
|
|
|
|
|
|
|
| 44 |
|
| 45 |
-
# Download checkpoints
|
| 46 |
-
# Option 1: huggingface-cli
|
| 47 |
huggingface-cli download SeonghuJeon/GLD --local-dir .
|
| 48 |
|
| 49 |
-
# Option 2: Python
|
| 50 |
-
from huggingface_hub import snapshot_download
|
| 51 |
-
snapshot_download("SeonghuJeon/GLD", local_dir=".")
|
| 52 |
-
|
| 53 |
# Run demo
|
| 54 |
./run_demo.sh da3
|
| 55 |
```
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
## Citation
|
| 58 |
|
| 59 |
```bibtex
|
|
|
|
| 11 |
|
| 12 |
**Repurposing Geometric Foundation Models for Multi-view Diffusion**
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
[[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
|
| 15 |
|
| 16 |
+
## Quick Start
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
```bash
|
| 19 |
git clone https://github.com/cvlab-kaist/GLD.git
|
| 20 |
cd GLD
|
| 21 |
+
conda env create -f environment.yml
|
| 22 |
+
conda activate gld
|
| 23 |
|
| 24 |
+
# Download all checkpoints
|
|
|
|
| 25 |
huggingface-cli download SeonghuJeon/GLD --local-dir .
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
# Run demo
|
| 28 |
./run_demo.sh da3
|
| 29 |
```
|
| 30 |
|
| 31 |
+
## Files
|
| 32 |
+
|
| 33 |
+
| File | Description | Params | Size |
|
| 34 |
+
|------|-------------|--------|------|
|
| 35 |
+
| `checkpoints/da3_level1.pt` | DA3 Level-1 diffusion (EMA) | 783M | 2.9G |
|
| 36 |
+
| `checkpoints/da3_cascade.pt` | DA3 Cascade: L1→L0 (EMA) | 473M | 1.8G |
|
| 37 |
+
| `checkpoints/vggt_level1.pt` | VGGT Level-1 diffusion (EMA) | 806M | 3.0G |
|
| 38 |
+
| `checkpoints/vggt_cascade.pt` | VGGT Cascade: L1→L0 (EMA) | 806M | 3.0G |
|
| 39 |
+
| `pretrained_models/da3/model.safetensors` | DA3-Base encoder | 135M | 0.5G |
|
| 40 |
+
| `pretrained_models/da3/dpt_decoder.pt` | DPT decoder (depth + geometry) | - | 1.1G |
|
| 41 |
+
| `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (EMA, decoder-only) | 423M | 1.6G |
|
| 42 |
+
| `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (EMA, decoder-only) | 425M | 1.6G |
|
| 43 |
+
|
| 44 |
+
Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
|
| 45 |
+
MAE decoder checkpoints contain **decoder weights only** (encoder removed).
|
| 46 |
+
|
| 47 |
## Citation
|
| 48 |
|
| 49 |
```bibtex
|