metadata
license: apache-2.0
tags:
- novel-view-synthesis
- multi-view-diffusion
- depth-estimation
- 3d-reconstruction
GLD: Geometric Latent Diffusion
Repurposing Geometric Foundation Models for Multi-view Diffusion
Wooseok Jang, Seonghu Jeon, Jisang Han, Jinhyeok Choi, Minkyung Kwon, Seungryong Kim, Saining Xie, Sainan Liu
KAIST, New York University, Intel Labs
Model Overview
GLD performs multi-view diffusion in the feature space of geometric foundation models (Depth Anything 3 / VGGT), enabling novel view synthesis with zero-shot geometry.
Checkpoints
| File | Description | Params |
|---|---|---|
checkpoints/da3_level1.pt |
DA3 Level-1 diffusion (EMA) | 783M |
checkpoints/da3_cascade.pt |
DA3 Cascade: L1→L0 (EMA) | 473M |
checkpoints/vggt_level1.pt |
VGGT Level-1 diffusion (EMA) | 806M |
checkpoints/vggt_cascade.pt |
VGGT Cascade: L1→L0 (EMA) | 806M |
pretrained_models/mae_decoder.pt |
DA3 MAE decoder (EMA, decoder-only) | 423M |
pretrained_models/vggt/mae_decoder.pt |
VGGT MAE decoder (EMA, decoder-only) | 425M |
pretrained_models/da3/model.safetensors |
DA3-Base encoder weights | 135M |
All checkpoints contain EMA weights only (optimizer/scheduler/discriminator removed).
MAE decoder checkpoints contain decoder weights only (encoder weights removed).
Usage
git clone https://github.com/cvlab-kaist/GLD.git
cd GLD
# Download checkpoints
# Option 1: huggingface-cli
huggingface-cli download SeonghuJeon/GLD --local-dir .
# Option 2: Python
from huggingface_hub import snapshot_download
snapshot_download("SeonghuJeon/GLD", local_dir=".")
# Run demo
./run_demo.sh da3
Citation
@article{jang2026gld,
title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
author={Jang, Wooseok and Jeon, Seonghu and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
journal={arXiv preprint},
year={2026}
}