--- license: apache-2.0 tags: - novel-view-synthesis - multi-view-diffusion - depth-estimation - 3d-reconstruction --- # GLD: Geometric Latent Diffusion **Repurposing Geometric Foundation Models for Multi-view Diffusion** [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD) ## Quick Start ```bash git clone https://github.com/cvlab-kaist/GLD.git cd GLD conda env create -f environment.yml conda activate gld # Download all checkpoints python -c "from huggingface_hub import snapshot_download; snapshot_download('SeonghuJeon/GLD', local_dir='.')" # Run demo ./run_demo.sh da3 ``` ## Files | File | Description | Params | Size | |------|-------------|--------|------| | `checkpoints/da3_level1.pt` | DA3 Level-1 diffusion (EMA) | 783M | 2.9G | | `checkpoints/da3_cascade.pt` | DA3 Cascade: L1→L0 (EMA) | 473M | 1.8G | | `checkpoints/vggt_level1.pt` | VGGT Level-1 diffusion (EMA) | 806M | 3.0G | | `checkpoints/vggt_cascade.pt` | VGGT Cascade: L1→L0 (EMA) | 806M | 3.0G | | `pretrained_models/da3/model.safetensors` | DA3-Base encoder | 135M | 0.5G | | `pretrained_models/da3/dpt_decoder.pt` | DPT decoder (depth + geometry) | - | 1.1G | | `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (EMA, decoder-only) | 423M | 1.6G | | `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (EMA, decoder-only) | 425M | 1.6G | Stage-2 and MAE decoder checkpoints contain **EMA weights only**. MAE decoder checkpoints contain **decoder weights only** (encoder removed).