metadata
license: apache-2.0
tags:
- novel-view-synthesis
- multi-view-diffusion
- depth-estimation
- 3d-reconstruction
GLD: Geometric Latent Diffusion
Repurposing Geometric Foundation Models for Multi-view Diffusion
Quick Start
git clone https://github.com/cvlab-kaist/GLD.git
cd GLD
conda env create -f environment.yml
conda activate gld
# Download all checkpoints
python -c "from huggingface_hub import snapshot_download; snapshot_download('SeonghuJeon/GLD', local_dir='.')"
# Run demo
./run_demo.sh da3
Files
| File | Description | Params | Size |
|---|---|---|---|
checkpoints/da3_level1.pt |
DA3 Level-1 diffusion (EMA) | 783M | 2.9G |
checkpoints/da3_cascade.pt |
DA3 Cascade: L1→L0 (EMA) | 473M | 1.8G |
checkpoints/vggt_level1.pt |
VGGT Level-1 diffusion (EMA) | 806M | 3.0G |
checkpoints/vggt_cascade.pt |
VGGT Cascade: L1→L0 (EMA) | 806M | 3.0G |
pretrained_models/da3/model.safetensors |
DA3-Base encoder | 135M | 0.5G |
pretrained_models/da3/dpt_decoder.pt |
DPT decoder (depth + geometry) | - | 1.1G |
pretrained_models/mae_decoder.pt |
DA3 MAE decoder (EMA, decoder-only) | 423M | 1.6G |
pretrained_models/vggt/mae_decoder.pt |
VGGT MAE decoder (EMA, decoder-only) | 425M | 1.6G |
Stage-2 and MAE decoder checkpoints contain EMA weights only. MAE decoder checkpoints contain decoder weights only (encoder removed).