GLD / README.md

SeonghuJeon

Upload README.md with huggingface_hub

22ec96d verified 28 days ago

2.11 kB

license: apache-2.0
tags:
  - novel-view-synthesis
  - multi-view-diffusion
  - depth-estimation
  - 3d-reconstruction

GLD: Geometric Latent Diffusion

Repurposing Geometric Foundation Models for Multi-view Diffusion

Wooseok Jang, Seonghu Jeon, Jisang Han, Jinhyeok Choi, Minkyung Kwon, Seungryong Kim, Saining Xie, Sainan Liu

KAIST, New York University, Intel Labs

[Project Page] | [Code]

Model Overview

GLD performs multi-view diffusion in the feature space of geometric foundation models (Depth Anything 3 / VGGT), enabling novel view synthesis with zero-shot geometry.

Checkpoints

File	Description	Params
`checkpoints/da3_level1.pt`	DA3 Level-1 diffusion (EMA)	783M
`checkpoints/da3_cascade.pt`	DA3 Cascade: L1→L0 (EMA)	473M
`checkpoints/vggt_level1.pt`	VGGT Level-1 diffusion (EMA)	806M
`checkpoints/vggt_cascade.pt`	VGGT Cascade: L1→L0 (EMA)	806M
`pretrained_models/mae_decoder.pt`	DA3 MAE decoder (EMA, decoder-only)	423M
`pretrained_models/vggt/mae_decoder.pt`	VGGT MAE decoder (EMA, decoder-only)	425M
`pretrained_models/da3/model.safetensors`	DA3-Base encoder weights	135M

All checkpoints contain EMA weights only (optimizer/scheduler/discriminator removed).
MAE decoder checkpoints contain decoder weights only (encoder weights removed).

Usage

git clone https://github.com/cvlab-kaist/GLD.git
cd GLD

# Download checkpoints
# Option 1: huggingface-cli
huggingface-cli download SeonghuJeon/GLD --local-dir .

# Option 2: Python
from huggingface_hub import snapshot_download
snapshot_download("SeonghuJeon/GLD", local_dir=".")

# Run demo
./run_demo.sh da3

Citation

@article{jang2026gld,
  title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
  author={Jang, Wooseok and Jeon, Seonghu and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
  journal={arXiv preprint},
  year={2026}
}