GLD / README.md

nielsr HF Staff

Add pipeline tag and paper link

1e72100 verified 10 days ago

2.78 kB

license: apache-2.0
pipeline_tag: image-to-3d
tags:
  - novel-view-synthesis
  - multi-view-diffusion
  - depth-estimation
  - 3d-reconstruction

GLD: Geometric Latent Diffusion

Repurposing Geometric Foundation Models for Multi-view Diffusion

[Paper] | [Project Page] | [Code]

Geometric Latent Diffusion (GLD) is a framework that repurposes the geometrically consistent feature space of geometric foundation models (such as Depth Anything 3 and VGGT) as the latent space for multi-view diffusion. By operating in this space rather than a view-independent VAE latent space, GLD achieves consistent novel view synthesis (NVS) and 3D reconstruction with significantly faster training convergence.

Quick Start

To use these models, follow the setup instructions in the official GitHub repository.

git clone https://github.com/cvlab-kaist/GLD.git
cd GLD
conda env create -f environment.yml
conda activate gld

# Download all checkpoints
python -c "from huggingface_hub import snapshot_download; snapshot_download('SeonghuJeon/GLD', local_dir='.')"

# Run demo
./run_demo.sh da3

Files

File	Description	Params	Size
`checkpoints/da3_level1.pt`	DA3 Level-1 diffusion (EMA)	783M	2.9G
`checkpoints/da3_cascade.pt`	DA3 Cascade: L1→L0 (EMA)	473M	1.8G
`checkpoints/vggt_level1.pt`	VGGT Level-1 diffusion (EMA)	806M	3.0G
`checkpoints/vggt_cascade.pt`	VGGT Cascade: L1→L0 (EMA)	806M	3.0G
`pretrained_models/da3/model.safetensors`	DA3-Base encoder	135M	0.5G
`pretrained_models/da3/dpt_decoder.pt`	DPT decoder (depth + geometry)	-	1.1G
`pretrained_models/mae_decoder.pt`	DA3 MAE decoder (EMA, decoder-only)	423M	1.6G
`pretrained_models/vggt/mae_decoder.pt`	VGGT MAE decoder (EMA, decoder-only)	425M	1.6G

Stage-2 and MAE decoder checkpoints contain EMA weights only. MAE decoder checkpoints contain decoder weights only (encoder removed).

Citation

@article{jang2026gld,
  title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
  author={Jang, Wooseok and Jeon, Seonghu  and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
  journal={arXiv preprint arXiv:2603.22275},
  year={2026}
}

Acknowledgements

Built upon RAE, Depth Anything 3, VGGT, CUT3R, and SiT.