| --- |
| license: mit |
| tags: |
| - depth-estimation |
| - diffusion |
| - monocular-depth |
| - lidar-prompted |
| - pixel-perfect-depth |
| - pytorch |
| --- |
| |
| # LiDAR-Perfect Depth (LPD) |
|
|
| Implementation of *LiDAR-Perfect Depth: Score-Decomposed Diffusion with Kalman-in-the-Loop Denoising for Sparse-Prompted Depth* on top of [Pixel-Perfect Depth](https://github.com/gangweix/pixel-perfect-depth). |
|
|
| ## Repos |
|
|
| | Repo | Contents | Size | |
| |---|---|---| |
| | `chenming-wu/LiDAR-Perfect-Depth` (this) | code + 6 LPD-DA2 checkpoints + inference vis + extraction helper | 12.5 GB | |
| | [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) | extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights | ~991 GB | |
|
|
| ## What's in this repo |
|
|
| - `code/` β full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders. |
| - `checkpoints/` |
| - `e000-s001000.ckpt` β¦ `e004-s005000.ckpt` β per-epoch fine-tuned weights (DA2 backbone, 5K steps) |
| - `last.ckpt` β same as e004 |
| - 2.0 GB each, weights-only |
| - `inference_vis/` β 8-sample qualitative panels (RGB | GT | PPD | LPD | LPD-variance) |
| - `extract_archives.sh` β extracts the dataset archives back into the layout the code expects |
|
|
| ## Backbone options |
|
|
| PPD (and therefore LPD) support two semantic-prompt backbones β switching is a single config change. |
|
|
| | Backbone | Trainable params | PPD ckpt | Config | |
| |---|---|---|---| |
| | **DA2** (Depth-Anything-V2 ViT-L) | 16.3 M | [`gangweix/Pixel-Perfect-Depth/ppd.pth`](https://huggingface.co/gangweix/Pixel-Perfect-Depth/resolve/main/ppd.pth) | `code/ppd/configs/lpd_run5d_10k.yaml` | |
| | **MoGe2** | 16.3 M | [`chenming-wu/LiDAR-Perfect-Depth-Datasets/pretrained/ppd_moge2/`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets/tree/main/pretrained/ppd_moge2) | `code/ppd/configs/lpd_run5d_moge2.yaml` | |
|
|
| The official PPD MoGe2 release reports 20-30 % improvement over DA2 on zero-shot benchmarks; the LPD prompt branch is identical for either backbone. |
|
|
| ## Training run summary (DA2 backbone, 5K steps) |
|
|
| ``` |
| Backbone: PPD-DA2 β 820 M, frozen |
| Trainable: 16 M (sparse-prompt encoder + gate) |
| Resolution: 1024 Γ 768 |
| Batch: 18 (~133 GB peak on a single H200) |
| Steps: 5,000 (5 epochs Γ 1000 batches) |
| Mix: Hypersim 0.5 / UrbanSyn 0.15 / UnrealStereo4K 0.15 / VKITTI2 0.1 / TartanAir 0.1 |
| Init: gangweix/Pixel-Perfect-Depth ppd.pth |
| |
| epoch 0 β 4 loss: 0.0186 β 0.0177 (-4.8%) |
| ``` |
|
|
| The 5 K steps shown here is a partial-train demo β paper-scale training would push it much further. |
|
|
| For MoGe2 backbone we verified the full forward + backward + checkpointing pipeline works |
| (`lpd_run5d_moge2.yaml`); training to convergence is left for a multi-GPU run. |
|
|
| ## Verification |
|
|
| ```bash |
| cd code/ |
| pip install -r requirements.txt |
| python -m ppd.lpd.tests.verify_paper # 30 paper claims, all pass |
| ``` |
|
|
| `PAPER_CHECKLIST.md` maps each section of the paper to specific code files/lines. |
|
|
| ## Reproducing training |
|
|
| ```bash |
| # Pretrained inputs the code expects |
| ln -sf <ppd.pth> code/checkpoints/ppd.pth # DA2 |
| ln -sf <depth_anything_v2_vitl.pth> code/checkpoints/depth_anything_v2_vitl.pth |
| # OR for MoGe2: |
| ln -sf <ppd_moge2.pth> code/checkpoints/ppd_moge2.pth |
| ln -sf <moge2.pt> code/checkpoints/moge2.pt |
| |
| # Hypersim 512Β² pretrain (DA2) |
| bash code/train_lpd.sh |
| |
| # 5-dataset 1024Γ768 fine-tune |
| python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml # DA2 |
| # OR |
| python code/main.py --cfg_file code/ppd/configs/lpd_run5d_moge2.yaml # MoGe2 |
| |
| # Inference comparison (PPD vs LPD) |
| python code/experiments/eval_lpd_vs_ppd.py |
| ``` |
|
|
| ## Datasets β fetching + extracting |
|
|
| Everything is hosted as **archives** in the dataset repo to keep file counts low and upload bandwidth high. |
|
|
| ```bash |
| hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \ |
| --local-dir /mnt/sig/datasets |
| |
| bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives |
| ``` |
|
|
| Layout after extraction matches what the LPD configs reference (`/mnt/sig/datasets/train/<scene>/...`, `/mnt/sig/datasets/eval_image/...`). |
|
|
| ## Citations |
|
|
| - **Pixel-Perfect Depth** β Xu et al., NeurIPS 2025 |
| - **LiDAR-Perfect Depth** β `paper.tex` in this repo |
|
|