license: apache-2.0
pipeline_tag: image-to-3d
tags:
- novel-view-synthesis
- multi-view-diffusion
- depth-estimation
- 3d-reconstruction
GLD: Geometric Latent Diffusion
Repurposing Geometric Foundation Models for Multi-view Diffusion
[Paper] | [Project Page] | [Code]
Geometric Latent Diffusion (GLD) is a framework that repurposes the geometrically consistent feature space of geometric foundation models (such as Depth Anything 3 and VGGT) as the latent space for multi-view diffusion. By operating in this space rather than a view-independent VAE latent space, GLD achieves consistent novel view synthesis (NVS) and 3D reconstruction with significantly faster training convergence.
Quick Start
To use these models, follow the setup instructions in the official GitHub repository.
git clone https://github.com/cvlab-kaist/GLD.git
cd GLD
conda env create -f environment.yml
conda activate gld
# Download all checkpoints
python -c "from huggingface_hub import snapshot_download; snapshot_download('SeonghuJeon/GLD', local_dir='.')"
# Run demo
./run_demo.sh da3
Files
| File | Description | Params | Size |
|---|---|---|---|
checkpoints/da3_level1.pt |
DA3 Level-1 diffusion (EMA) | 783M | 2.9G |
checkpoints/da3_cascade.pt |
DA3 Cascade: L1→L0 (EMA) | 473M | 1.8G |
checkpoints/vggt_level1.pt |
VGGT Level-1 diffusion (EMA) | 806M | 3.0G |
checkpoints/vggt_cascade.pt |
VGGT Cascade: L1→L0 (EMA) | 806M | 3.0G |
pretrained_models/da3/model.safetensors |
DA3-Base encoder | 135M | 0.5G |
pretrained_models/da3/dpt_decoder.pt |
DPT decoder (depth + geometry) | - | 1.1G |
pretrained_models/mae_decoder.pt |
DA3 MAE decoder (EMA, decoder-only) | 423M | 1.6G |
pretrained_models/vggt/mae_decoder.pt |
VGGT MAE decoder (EMA, decoder-only) | 425M | 1.6G |
Stage-2 and MAE decoder checkpoints contain EMA weights only. MAE decoder checkpoints contain decoder weights only (encoder removed).
Citation
@article{jang2026gld,
title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
author={Jang, Wooseok and Jeon, Seonghu and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
journal={arXiv preprint arXiv:2603.22275},
year={2026}
}
Acknowledgements
Built upon RAE, Depth Anything 3, VGGT, CUT3R, and SiT.