LiDAR-Perfect-Depth / README.md
chenming-wu's picture
README: link MoGe2 ckpt at the data repo's pretrained/ (no longer GDrive)
792e931 verified
---
license: mit
tags:
- depth-estimation
- diffusion
- monocular-depth
- lidar-prompted
- pixel-perfect-depth
- pytorch
---
# LiDAR-Perfect Depth (LPD)
Implementation of *LiDAR-Perfect Depth: Score-Decomposed Diffusion with Kalman-in-the-Loop Denoising for Sparse-Prompted Depth* on top of [Pixel-Perfect Depth](https://github.com/gangweix/pixel-perfect-depth).
## Repos
| Repo | Contents | Size |
|---|---|---|
| `chenming-wu/LiDAR-Perfect-Depth` (this) | code + 6 LPD-DA2 checkpoints + inference vis + extraction helper | 12.5 GB |
| [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) | extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights | ~991 GB |
## What's in this repo
- `code/` β€” full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders.
- `checkpoints/`
- `e000-s001000.ckpt` … `e004-s005000.ckpt` β€” per-epoch fine-tuned weights (DA2 backbone, 5K steps)
- `last.ckpt` β€” same as e004
- 2.0 GB each, weights-only
- `inference_vis/` β€” 8-sample qualitative panels (RGB | GT | PPD | LPD | LPD-variance)
- `extract_archives.sh` β€” extracts the dataset archives back into the layout the code expects
## Backbone options
PPD (and therefore LPD) support two semantic-prompt backbones β€” switching is a single config change.
| Backbone | Trainable params | PPD ckpt | Config |
|---|---|---|---|
| **DA2** (Depth-Anything-V2 ViT-L) | 16.3 M | [`gangweix/Pixel-Perfect-Depth/ppd.pth`](https://huggingface.co/gangweix/Pixel-Perfect-Depth/resolve/main/ppd.pth) | `code/ppd/configs/lpd_run5d_10k.yaml` |
| **MoGe2** | 16.3 M | [`chenming-wu/LiDAR-Perfect-Depth-Datasets/pretrained/ppd_moge2/`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets/tree/main/pretrained/ppd_moge2) | `code/ppd/configs/lpd_run5d_moge2.yaml` |
The official PPD MoGe2 release reports 20-30 % improvement over DA2 on zero-shot benchmarks; the LPD prompt branch is identical for either backbone.
## Training run summary (DA2 backbone, 5K steps)
```
Backbone: PPD-DA2 β€” 820 M, frozen
Trainable: 16 M (sparse-prompt encoder + gate)
Resolution: 1024 Γ— 768
Batch: 18 (~133 GB peak on a single H200)
Steps: 5,000 (5 epochs Γ— 1000 batches)
Mix: Hypersim 0.5 / UrbanSyn 0.15 / UnrealStereo4K 0.15 / VKITTI2 0.1 / TartanAir 0.1
Init: gangweix/Pixel-Perfect-Depth ppd.pth
epoch 0 β†’ 4 loss: 0.0186 β†’ 0.0177 (-4.8%)
```
The 5 K steps shown here is a partial-train demo β€” paper-scale training would push it much further.
For MoGe2 backbone we verified the full forward + backward + checkpointing pipeline works
(`lpd_run5d_moge2.yaml`); training to convergence is left for a multi-GPU run.
## Verification
```bash
cd code/
pip install -r requirements.txt
python -m ppd.lpd.tests.verify_paper # 30 paper claims, all pass
```
`PAPER_CHECKLIST.md` maps each section of the paper to specific code files/lines.
## Reproducing training
```bash
# Pretrained inputs the code expects
ln -sf <ppd.pth> code/checkpoints/ppd.pth # DA2
ln -sf <depth_anything_v2_vitl.pth> code/checkpoints/depth_anything_v2_vitl.pth
# OR for MoGe2:
ln -sf <ppd_moge2.pth> code/checkpoints/ppd_moge2.pth
ln -sf <moge2.pt> code/checkpoints/moge2.pt
# Hypersim 512Β² pretrain (DA2)
bash code/train_lpd.sh
# 5-dataset 1024Γ—768 fine-tune
python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml # DA2
# OR
python code/main.py --cfg_file code/ppd/configs/lpd_run5d_moge2.yaml # MoGe2
# Inference comparison (PPD vs LPD)
python code/experiments/eval_lpd_vs_ppd.py
```
## Datasets β€” fetching + extracting
Everything is hosted as **archives** in the dataset repo to keep file counts low and upload bandwidth high.
```bash
hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \
--local-dir /mnt/sig/datasets
bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives
```
Layout after extraction matches what the LPD configs reference (`/mnt/sig/datasets/train/<scene>/...`, `/mnt/sig/datasets/eval_image/...`).
## Citations
- **Pixel-Perfect Depth** β€” Xu et al., NeurIPS 2025
- **LiDAR-Perfect Depth** β€” `paper.tex` in this repo