README: link MoGe2 ckpt at the data repo's pretrained/ (no longer GDrive)

792e931 verified 18 days ago

4.33 kB

	---
	license: mit
	tags:
	- depth-estimation
	- diffusion
	- monocular-depth
	- lidar-prompted
	- pixel-perfect-depth
	- pytorch
	---

	# LiDAR-Perfect Depth (LPD)

	Implementation of LiDAR-Perfect Depth: Score-Decomposed Diffusion with Kalman-in-the-Loop Denoising for Sparse-Prompted Depth on top of [Pixel-Perfect Depth](https://github.com/gangweix/pixel-perfect-depth).

	## Repos

	\| Repo \| Contents \| Size \|
	\|---\|---\|---\|
	\| `chenming-wu/LiDAR-Perfect-Depth` (this) \| code + 6 LPD-DA2 checkpoints + inference vis + extraction helper \| 12.5 GB \|
	\| [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) \| extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights \| ~991 GB \|

	## What's in this repo

	- `code/` — full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders.
	- `checkpoints/`
	- `e000-s001000.ckpt` … `e004-s005000.ckpt` — per-epoch fine-tuned weights (DA2 backbone, 5K steps)
	- `last.ckpt` — same as e004
	- 2.0 GB each, weights-only
	- `inference_vis/` — 8-sample qualitative panels (RGB \| GT \| PPD \| LPD \| LPD-variance)
	- `extract_archives.sh` — extracts the dataset archives back into the layout the code expects

	## Backbone options

	PPD (and therefore LPD) support two semantic-prompt backbones — switching is a single config change.

	\| Backbone \| Trainable params \| PPD ckpt \| Config \|
	\|---\|---\|---\|---\|
	\| DA2 (Depth-Anything-V2 ViT-L) \| 16.3 M \| [`gangweix/Pixel-Perfect-Depth/ppd.pth`](https://huggingface.co/gangweix/Pixel-Perfect-Depth/resolve/main/ppd.pth) \| `code/ppd/configs/lpd_run5d_10k.yaml` \|
	\| MoGe2 \| 16.3 M \| [`chenming-wu/LiDAR-Perfect-Depth-Datasets/pretrained/ppd_moge2/`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets/tree/main/pretrained/ppd_moge2) \| `code/ppd/configs/lpd_run5d_moge2.yaml` \|

	The official PPD MoGe2 release reports 20-30 % improvement over DA2 on zero-shot benchmarks; the LPD prompt branch is identical for either backbone.

	## Training run summary (DA2 backbone, 5K steps)

	```
	Backbone: PPD-DA2 — 820 M, frozen
	Trainable: 16 M (sparse-prompt encoder + gate)
	Resolution: 1024 × 768
	Batch: 18 (~133 GB peak on a single H200)
	Steps: 5,000 (5 epochs × 1000 batches)
	Mix: Hypersim 0.5 / UrbanSyn 0.15 / UnrealStereo4K 0.15 / VKITTI2 0.1 / TartanAir 0.1
	Init: gangweix/Pixel-Perfect-Depth ppd.pth

	epoch 0 → 4 loss: 0.0186 → 0.0177 (-4.8%)
	```

	The 5 K steps shown here is a partial-train demo — paper-scale training would push it much further.

	For MoGe2 backbone we verified the full forward + backward + checkpointing pipeline works
	(`lpd_run5d_moge2.yaml`); training to convergence is left for a multi-GPU run.

	## Verification

	```bash
	cd code/
	pip install -r requirements.txt
	python -m ppd.lpd.tests.verify_paper # 30 paper claims, all pass
	```

	`PAPER_CHECKLIST.md` maps each section of the paper to specific code files/lines.

	## Reproducing training

	```bash
	# Pretrained inputs the code expects
	ln -sf <ppd.pth> code/checkpoints/ppd.pth # DA2
	ln -sf <depth_anything_v2_vitl.pth> code/checkpoints/depth_anything_v2_vitl.pth
	# OR for MoGe2:
	ln -sf <ppd_moge2.pth> code/checkpoints/ppd_moge2.pth
	ln -sf <moge2.pt> code/checkpoints/moge2.pt

	# Hypersim 512² pretrain (DA2)
	bash code/train_lpd.sh

	# 5-dataset 1024×768 fine-tune
	python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml # DA2
	# OR
	python code/main.py --cfg_file code/ppd/configs/lpd_run5d_moge2.yaml # MoGe2

	# Inference comparison (PPD vs LPD)
	python code/experiments/eval_lpd_vs_ppd.py
	```

	## Datasets — fetching + extracting

	Everything is hosted as archives in the dataset repo to keep file counts low and upload bandwidth high.

	```bash
	hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \
	--local-dir /mnt/sig/datasets

	bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives
	```

	Layout after extraction matches what the LPD configs reference (`/mnt/sig/datasets/train/<scene>/...`, `/mnt/sig/datasets/eval_image/...`).

	## Citations

	- Pixel-Perfect Depth — Xu et al., NeurIPS 2025
	- LiDAR-Perfect Depth — `paper.tex` in this repo