File size: 4,329 Bytes
3f4c2db d0edd32 7fbb80e 3f4c2db d0edd32 3f4c2db 7fbb80e d0edd32 7fbb80e d0edd32 3f4c2db 7fbb80e 792e931 7fbb80e 3f4c2db 7fbb80e d0edd32 3f4c2db d0edd32 3f4c2db 7fbb80e d0edd32 3f4c2db d0edd32 3f4c2db d0edd32 3f4c2db d0edd32 3f4c2db d0edd32 7fbb80e d0edd32 7fbb80e 3f4c2db 7fbb80e 3f4c2db 7fbb80e 3f4c2db d0edd32 3f4c2db d0edd32 3f4c2db d0edd32 3f4c2db d0edd32 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | ---
license: mit
tags:
- depth-estimation
- diffusion
- monocular-depth
- lidar-prompted
- pixel-perfect-depth
- pytorch
---
# LiDAR-Perfect Depth (LPD)
Implementation of *LiDAR-Perfect Depth: Score-Decomposed Diffusion with Kalman-in-the-Loop Denoising for Sparse-Prompted Depth* on top of [Pixel-Perfect Depth](https://github.com/gangweix/pixel-perfect-depth).
## Repos
| Repo | Contents | Size |
|---|---|---|
| `chenming-wu/LiDAR-Perfect-Depth` (this) | code + 6 LPD-DA2 checkpoints + inference vis + extraction helper | 12.5 GB |
| [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) | extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights | ~991 GB |
## What's in this repo
- `code/` β full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders.
- `checkpoints/`
- `e000-s001000.ckpt` β¦ `e004-s005000.ckpt` β per-epoch fine-tuned weights (DA2 backbone, 5K steps)
- `last.ckpt` β same as e004
- 2.0 GB each, weights-only
- `inference_vis/` β 8-sample qualitative panels (RGB | GT | PPD | LPD | LPD-variance)
- `extract_archives.sh` β extracts the dataset archives back into the layout the code expects
## Backbone options
PPD (and therefore LPD) support two semantic-prompt backbones β switching is a single config change.
| Backbone | Trainable params | PPD ckpt | Config |
|---|---|---|---|
| **DA2** (Depth-Anything-V2 ViT-L) | 16.3 M | [`gangweix/Pixel-Perfect-Depth/ppd.pth`](https://huggingface.co/gangweix/Pixel-Perfect-Depth/resolve/main/ppd.pth) | `code/ppd/configs/lpd_run5d_10k.yaml` |
| **MoGe2** | 16.3 M | [`chenming-wu/LiDAR-Perfect-Depth-Datasets/pretrained/ppd_moge2/`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets/tree/main/pretrained/ppd_moge2) | `code/ppd/configs/lpd_run5d_moge2.yaml` |
The official PPD MoGe2 release reports 20-30 % improvement over DA2 on zero-shot benchmarks; the LPD prompt branch is identical for either backbone.
## Training run summary (DA2 backbone, 5K steps)
```
Backbone: PPD-DA2 β 820 M, frozen
Trainable: 16 M (sparse-prompt encoder + gate)
Resolution: 1024 Γ 768
Batch: 18 (~133 GB peak on a single H200)
Steps: 5,000 (5 epochs Γ 1000 batches)
Mix: Hypersim 0.5 / UrbanSyn 0.15 / UnrealStereo4K 0.15 / VKITTI2 0.1 / TartanAir 0.1
Init: gangweix/Pixel-Perfect-Depth ppd.pth
epoch 0 β 4 loss: 0.0186 β 0.0177 (-4.8%)
```
The 5 K steps shown here is a partial-train demo β paper-scale training would push it much further.
For MoGe2 backbone we verified the full forward + backward + checkpointing pipeline works
(`lpd_run5d_moge2.yaml`); training to convergence is left for a multi-GPU run.
## Verification
```bash
cd code/
pip install -r requirements.txt
python -m ppd.lpd.tests.verify_paper # 30 paper claims, all pass
```
`PAPER_CHECKLIST.md` maps each section of the paper to specific code files/lines.
## Reproducing training
```bash
# Pretrained inputs the code expects
ln -sf <ppd.pth> code/checkpoints/ppd.pth # DA2
ln -sf <depth_anything_v2_vitl.pth> code/checkpoints/depth_anything_v2_vitl.pth
# OR for MoGe2:
ln -sf <ppd_moge2.pth> code/checkpoints/ppd_moge2.pth
ln -sf <moge2.pt> code/checkpoints/moge2.pt
# Hypersim 512Β² pretrain (DA2)
bash code/train_lpd.sh
# 5-dataset 1024Γ768 fine-tune
python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml # DA2
# OR
python code/main.py --cfg_file code/ppd/configs/lpd_run5d_moge2.yaml # MoGe2
# Inference comparison (PPD vs LPD)
python code/experiments/eval_lpd_vs_ppd.py
```
## Datasets β fetching + extracting
Everything is hosted as **archives** in the dataset repo to keep file counts low and upload bandwidth high.
```bash
hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \
--local-dir /mnt/sig/datasets
bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives
```
Layout after extraction matches what the LPD configs reference (`/mnt/sig/datasets/train/<scene>/...`, `/mnt/sig/datasets/eval_image/...`).
## Citations
- **Pixel-Perfect Depth** β Xu et al., NeurIPS 2025
- **LiDAR-Perfect Depth** β `paper.tex` in this repo
|