GLD / README.md
nielsr's picture
nielsr HF Staff
Add pipeline tag and paper link
1e72100 verified
|
raw
history blame
2.78 kB
metadata
license: apache-2.0
pipeline_tag: image-to-3d
tags:
  - novel-view-synthesis
  - multi-view-diffusion
  - depth-estimation
  - 3d-reconstruction

GLD: Geometric Latent Diffusion

Repurposing Geometric Foundation Models for Multi-view Diffusion

[Paper] | [Project Page] | [Code]

Geometric Latent Diffusion (GLD) is a framework that repurposes the geometrically consistent feature space of geometric foundation models (such as Depth Anything 3 and VGGT) as the latent space for multi-view diffusion. By operating in this space rather than a view-independent VAE latent space, GLD achieves consistent novel view synthesis (NVS) and 3D reconstruction with significantly faster training convergence.

Quick Start

To use these models, follow the setup instructions in the official GitHub repository.

git clone https://github.com/cvlab-kaist/GLD.git
cd GLD
conda env create -f environment.yml
conda activate gld

# Download all checkpoints
python -c "from huggingface_hub import snapshot_download; snapshot_download('SeonghuJeon/GLD', local_dir='.')"

# Run demo
./run_demo.sh da3

Files

File Description Params Size
checkpoints/da3_level1.pt DA3 Level-1 diffusion (EMA) 783M 2.9G
checkpoints/da3_cascade.pt DA3 Cascade: L1→L0 (EMA) 473M 1.8G
checkpoints/vggt_level1.pt VGGT Level-1 diffusion (EMA) 806M 3.0G
checkpoints/vggt_cascade.pt VGGT Cascade: L1→L0 (EMA) 806M 3.0G
pretrained_models/da3/model.safetensors DA3-Base encoder 135M 0.5G
pretrained_models/da3/dpt_decoder.pt DPT decoder (depth + geometry) - 1.1G
pretrained_models/mae_decoder.pt DA3 MAE decoder (EMA, decoder-only) 423M 1.6G
pretrained_models/vggt/mae_decoder.pt VGGT MAE decoder (EMA, decoder-only) 425M 1.6G

Stage-2 and MAE decoder checkpoints contain EMA weights only. MAE decoder checkpoints contain decoder weights only (encoder removed).

Citation

@article{jang2026gld,
  title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
  author={Jang, Wooseok and Jeon, Seonghu  and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
  journal={arXiv preprint arXiv:2603.22275},
  year={2026}
}

Acknowledgements

Built upon RAE, Depth Anything 3, VGGT, CUT3R, and SiT.