--- license: mit tags: - depth-estimation - diffusion - monocular-depth - lidar-prompted - pixel-perfect-depth - pytorch --- # LiDAR-Perfect Depth (LPD) Implementation of *LiDAR-Perfect Depth: Score-Decomposed Diffusion with Kalman-in-the-Loop Denoising for Sparse-Prompted Depth* on top of [Pixel-Perfect Depth](https://github.com/gangweix/pixel-perfect-depth). ## Repos | Repo | Contents | Size | |---|---|---| | `chenming-wu/LiDAR-Perfect-Depth` (this) | code + 6 LPD-DA2 checkpoints + inference vis + extraction helper | 12.5 GB | | [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) | extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights | ~991 GB | ## What's in this repo - `code/` — full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders. - `checkpoints/` - `e000-s001000.ckpt` … `e004-s005000.ckpt` — per-epoch fine-tuned weights (DA2 backbone, 5K steps) - `last.ckpt` — same as e004 - 2.0 GB each, weights-only - `inference_vis/` — 8-sample qualitative panels (RGB | GT | PPD | LPD | LPD-variance) - `extract_archives.sh` — extracts the dataset archives back into the layout the code expects ## Backbone options PPD (and therefore LPD) support two semantic-prompt backbones — switching is a single config change. | Backbone | Trainable params | PPD ckpt | Config | |---|---|---|---| | **DA2** (Depth-Anything-V2 ViT-L) | 16.3 M | [`gangweix/Pixel-Perfect-Depth/ppd.pth`](https://huggingface.co/gangweix/Pixel-Perfect-Depth/resolve/main/ppd.pth) | `code/ppd/configs/lpd_run5d_10k.yaml` | | **MoGe2** | 16.3 M | [`chenming-wu/LiDAR-Perfect-Depth-Datasets/pretrained/ppd_moge2/`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets/tree/main/pretrained/ppd_moge2) | `code/ppd/configs/lpd_run5d_moge2.yaml` | The official PPD MoGe2 release reports 20-30 % improvement over DA2 on zero-shot benchmarks; the LPD prompt branch is identical for either backbone. ## Training run summary (DA2 backbone, 5K steps) ``` Backbone: PPD-DA2 — 820 M, frozen Trainable: 16 M (sparse-prompt encoder + gate) Resolution: 1024 × 768 Batch: 18 (~133 GB peak on a single H200) Steps: 5,000 (5 epochs × 1000 batches) Mix: Hypersim 0.5 / UrbanSyn 0.15 / UnrealStereo4K 0.15 / VKITTI2 0.1 / TartanAir 0.1 Init: gangweix/Pixel-Perfect-Depth ppd.pth epoch 0 → 4 loss: 0.0186 → 0.0177 (-4.8%) ``` The 5 K steps shown here is a partial-train demo — paper-scale training would push it much further. For MoGe2 backbone we verified the full forward + backward + checkpointing pipeline works (`lpd_run5d_moge2.yaml`); training to convergence is left for a multi-GPU run. ## Verification ```bash cd code/ pip install -r requirements.txt python -m ppd.lpd.tests.verify_paper # 30 paper claims, all pass ``` `PAPER_CHECKLIST.md` maps each section of the paper to specific code files/lines. ## Reproducing training ```bash # Pretrained inputs the code expects ln -sf code/checkpoints/ppd.pth # DA2 ln -sf code/checkpoints/depth_anything_v2_vitl.pth # OR for MoGe2: ln -sf code/checkpoints/ppd_moge2.pth ln -sf code/checkpoints/moge2.pt # Hypersim 512² pretrain (DA2) bash code/train_lpd.sh # 5-dataset 1024×768 fine-tune python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml # DA2 # OR python code/main.py --cfg_file code/ppd/configs/lpd_run5d_moge2.yaml # MoGe2 # Inference comparison (PPD vs LPD) python code/experiments/eval_lpd_vs_ppd.py ``` ## Datasets — fetching + extracting Everything is hosted as **archives** in the dataset repo to keep file counts low and upload bandwidth high. ```bash hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \ --local-dir /mnt/sig/datasets bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives ``` Layout after extraction matches what the LPD configs reference (`/mnt/sig/datasets/train//...`, `/mnt/sig/datasets/eval_image/...`). ## Citations - **Pixel-Perfect Depth** — Xu et al., NeurIPS 2025 - **LiDAR-Perfect Depth** — `paper.tex` in this repo