SeonghuJeon
/

GLD

Depth Estimation

novel-view-synthesis

multi-view-diffusion

3d-reconstruction

Model card Files Files and versions

GLD / README.md

SeonghuJeon's picture

Upload README.md with huggingface_hub

eb3d89a verified 1 day ago

|

history blame contribute delete

1.53 kB

	---
	license: apache-2.0
	tags:
	- novel-view-synthesis
	- multi-view-diffusion
	- depth-estimation
	- 3d-reconstruction
	---

	# GLD: Geometric Latent Diffusion

	Repurposing Geometric Foundation Models for Multi-view Diffusion

	[[Project Page]](https://cvlab-kaist.github.io/GLD/) \| [[Code]](https://github.com/cvlab-kaist/GLD)

	## Quick Start

	```bash
	git clone https://github.com/cvlab-kaist/GLD.git
	cd GLD
	conda env create -f environment.yml
	conda activate gld

	# Download all checkpoints
	python -c "from huggingface_hub import snapshot_download; snapshot_download('SeonghuJeon/GLD', local_dir='.')"

	# Run demo
	./run_demo.sh da3
	```

	## Files

	\| File \| Description \| Params \| Size \|
	\|------\|-------------\|--------\|------\|
	\| `checkpoints/da3_level1.pt` \| DA3 Level-1 diffusion (EMA) \| 783M \| 2.9G \|
	\| `checkpoints/da3_cascade.pt` \| DA3 Cascade: L1→L0 (EMA) \| 473M \| 1.8G \|
	\| `checkpoints/vggt_level1.pt` \| VGGT Level-1 diffusion (EMA) \| 806M \| 3.0G \|
	\| `checkpoints/vggt_cascade.pt` \| VGGT Cascade: L1→L0 (EMA) \| 806M \| 3.0G \|
	\| `pretrained_models/da3/model.safetensors` \| DA3-Base encoder \| 135M \| 0.5G \|
	\| `pretrained_models/da3/dpt_decoder.pt` \| DPT decoder (depth + geometry) \| - \| 1.1G \|
	\| `pretrained_models/mae_decoder.pt` \| DA3 MAE decoder (EMA, decoder-only) \| 423M \| 1.6G \|
	\| `pretrained_models/vggt/mae_decoder.pt` \| VGGT MAE decoder (EMA, decoder-only) \| 425M \| 1.6G \|

	Stage-2 and MAE decoder checkpoints contain EMA weights only.
	MAE decoder checkpoints contain decoder weights only (encoder removed).