SeonghuJeon
/

GLD

@@ -11,49 +11,39 @@ tags:
 **Repurposing Geometric Foundation Models for Multi-view Diffusion**
-Wooseok Jang, Seonghu Jeon, Jisang Han, Jinhyeok Choi, Minkyung Kwon, Seungryong Kim, Saining Xie, Sainan Liu
-KAIST, New York University, Intel Labs
 [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
-## Model Overview
-GLD performs multi-view diffusion in the feature space of geometric foundation models (Depth Anything 3 / VGGT), enabling novel view synthesis with zero-shot geometry.
-## Checkpoints
-| File | Description | Params |
-|------|-------------|--------|
-| `checkpoints/da3_level1.pt` | DA3 Level-1 diffusion (EMA) | 783M |
-| `checkpoints/da3_cascade.pt` | DA3 Cascade: L1→L0 (EMA) | 473M |
-| `checkpoints/vggt_level1.pt` | VGGT Level-1 diffusion (EMA) | 806M |
-| `checkpoints/vggt_cascade.pt` | VGGT Cascade: L1→L0 (EMA) | 806M |
-| `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (EMA, decoder-only) | 423M |
-| `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (EMA, decoder-only) | 425M |
-| `pretrained_models/da3/model.safetensors` | DA3-Base encoder weights | 135M |
-All checkpoints contain **EMA weights only** (optimizer/scheduler/discriminator removed).
-MAE decoder checkpoints contain **decoder weights only** (encoder weights removed).
-## Usage
 ```bash
 git clone https://github.com/cvlab-kaist/GLD.git
 cd GLD
-# Download checkpoints
-# Option 1: huggingface-cli
 huggingface-cli download SeonghuJeon/GLD --local-dir .
-# Option 2: Python
-from huggingface_hub import snapshot_download
-snapshot_download("SeonghuJeon/GLD", local_dir=".")
 # Run demo
 ./run_demo.sh da3
 ```
 ## Citation
 ```bibtex

 **Repurposing Geometric Foundation Models for Multi-view Diffusion**
 [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
+## Quick Start
 ```bash
 git clone https://github.com/cvlab-kaist/GLD.git
 cd GLD
+conda env create -f environment.yml
+conda activate gld
+# Download all checkpoints
 huggingface-cli download SeonghuJeon/GLD --local-dir .
 # Run demo
 ./run_demo.sh da3
 ```
+## Files
+| File | Description | Params | Size |
+|------|-------------|--------|------|
+| `checkpoints/da3_level1.pt` | DA3 Level-1 diffusion (EMA) | 783M | 2.9G |
+| `checkpoints/da3_cascade.pt` | DA3 Cascade: L1→L0 (EMA) | 473M | 1.8G |
+| `checkpoints/vggt_level1.pt` | VGGT Level-1 diffusion (EMA) | 806M | 3.0G |
+| `checkpoints/vggt_cascade.pt` | VGGT Cascade: L1→L0 (EMA) | 806M | 3.0G |
+| `pretrained_models/da3/model.safetensors` | DA3-Base encoder | 135M | 0.5G |
+| `pretrained_models/da3/dpt_decoder.pt` | DPT decoder (depth + geometry) | - | 1.1G |
+| `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (EMA, decoder-only) | 423M | 1.6G |
+| `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (EMA, decoder-only) | 425M | 1.6G |
+Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
+MAE decoder checkpoints contain **decoder weights only** (encoder removed).
 ## Citation
 ```bibtex