GLD / README.md
SeonghuJeon's picture
Upload README.md with huggingface_hub
22ec96d verified
|
raw
history blame
2.11 kB
metadata
license: apache-2.0
tags:
  - novel-view-synthesis
  - multi-view-diffusion
  - depth-estimation
  - 3d-reconstruction

GLD: Geometric Latent Diffusion

Repurposing Geometric Foundation Models for Multi-view Diffusion

Wooseok Jang, Seonghu Jeon, Jisang Han, Jinhyeok Choi, Minkyung Kwon, Seungryong Kim, Saining Xie, Sainan Liu

KAIST, New York University, Intel Labs

[Project Page] | [Code]

Model Overview

GLD performs multi-view diffusion in the feature space of geometric foundation models (Depth Anything 3 / VGGT), enabling novel view synthesis with zero-shot geometry.

Checkpoints

File Description Params
checkpoints/da3_level1.pt DA3 Level-1 diffusion (EMA) 783M
checkpoints/da3_cascade.pt DA3 Cascade: L1→L0 (EMA) 473M
checkpoints/vggt_level1.pt VGGT Level-1 diffusion (EMA) 806M
checkpoints/vggt_cascade.pt VGGT Cascade: L1→L0 (EMA) 806M
pretrained_models/mae_decoder.pt DA3 MAE decoder (EMA, decoder-only) 423M
pretrained_models/vggt/mae_decoder.pt VGGT MAE decoder (EMA, decoder-only) 425M
pretrained_models/da3/model.safetensors DA3-Base encoder weights 135M

All checkpoints contain EMA weights only (optimizer/scheduler/discriminator removed).
MAE decoder checkpoints contain decoder weights only (encoder weights removed).

Usage

git clone https://github.com/cvlab-kaist/GLD.git
cd GLD

# Download checkpoints
# Option 1: huggingface-cli
huggingface-cli download SeonghuJeon/GLD --local-dir .

# Option 2: Python
from huggingface_hub import snapshot_download
snapshot_download("SeonghuJeon/GLD", local_dir=".")

# Run demo
./run_demo.sh da3

Citation

@article{jang2026gld,
  title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
  author={Jang, Wooseok and Jeon, Seonghu and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
  journal={arXiv preprint},
  year={2026}
}