File size: 4,329 Bytes

3f4c2db
 
 
 
 
 
 
 
 
 
 
 
 
d0edd32
 
 
 
 
 
7fbb80e
 
3f4c2db
 
 
d0edd32
3f4c2db
7fbb80e
d0edd32
7fbb80e
d0edd32
 
3f4c2db
7fbb80e
 
 
 
 
 
 
792e931
7fbb80e
 
 
 
3f4c2db
 
7fbb80e
d0edd32
 
 
 
 
 
 
 
3f4c2db
 
d0edd32
3f4c2db
7fbb80e
 
 
d0edd32
3f4c2db
 
 
d0edd32
 
3f4c2db
 
d0edd32
3f4c2db
d0edd32
3f4c2db
 
d0edd32
7fbb80e
d0edd32
7fbb80e
 
 
3f4c2db
7fbb80e
3f4c2db
 
7fbb80e
 
 
 
3f4c2db
d0edd32
3f4c2db
 
 
d0edd32
 
 
 
 
 
 
 
 
 
3f4c2db
d0edd32
3f4c2db
 
 
d0edd32

---
license: mit
tags:
- depth-estimation
- diffusion
- monocular-depth
- lidar-prompted
- pixel-perfect-depth
- pytorch
---

# LiDAR-Perfect Depth (LPD)

Implementation of *LiDAR-Perfect Depth: Score-Decomposed Diffusion with Kalman-in-the-Loop Denoising for Sparse-Prompted Depth* on top of [Pixel-Perfect Depth](https://github.com/gangweix/pixel-perfect-depth).

## Repos

| Repo | Contents | Size |
|---|---|---|
| `chenming-wu/LiDAR-Perfect-Depth` (this) | code + 6 LPD-DA2 checkpoints + inference vis + extraction helper | 12.5 GB |
| [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) | extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights | ~991 GB |

## What's in this repo

- `code/` — full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders.
- `checkpoints/`
  - `e000-s001000.ckpt` … `e004-s005000.ckpt` — per-epoch fine-tuned weights (DA2 backbone, 5K steps)
  - `last.ckpt` — same as e004
  - 2.0 GB each, weights-only
- `inference_vis/` — 8-sample qualitative panels (RGB | GT | PPD | LPD | LPD-variance)
- `extract_archives.sh` — extracts the dataset archives back into the layout the code expects

## Backbone options

PPD (and therefore LPD) support two semantic-prompt backbones — switching is a single config change.

| Backbone | Trainable params | PPD ckpt | Config |
|---|---|---|---|
| **DA2** (Depth-Anything-V2 ViT-L) | 16.3 M | [`gangweix/Pixel-Perfect-Depth/ppd.pth`](https://huggingface.co/gangweix/Pixel-Perfect-Depth/resolve/main/ppd.pth) | `code/ppd/configs/lpd_run5d_10k.yaml` |
| **MoGe2** | 16.3 M | [`chenming-wu/LiDAR-Perfect-Depth-Datasets/pretrained/ppd_moge2/`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets/tree/main/pretrained/ppd_moge2) | `code/ppd/configs/lpd_run5d_moge2.yaml` |

The official PPD MoGe2 release reports 20-30 % improvement over DA2 on zero-shot benchmarks; the LPD prompt branch is identical for either backbone.

## Training run summary (DA2 backbone, 5K steps)

```
Backbone:   PPD-DA2 — 820 M, frozen
Trainable:  16 M (sparse-prompt encoder + gate)
Resolution: 1024 × 768
Batch:      18  (~133 GB peak on a single H200)
Steps:      5,000 (5 epochs × 1000 batches)
Mix:        Hypersim 0.5 / UrbanSyn 0.15 / UnrealStereo4K 0.15 / VKITTI2 0.1 / TartanAir 0.1
Init:       gangweix/Pixel-Perfect-Depth ppd.pth

epoch 0 → 4   loss: 0.0186 → 0.0177  (-4.8%)
```

The 5 K steps shown here is a partial-train demo — paper-scale training would push it much further.

For MoGe2 backbone we verified the full forward + backward + checkpointing pipeline works
(`lpd_run5d_moge2.yaml`); training to convergence is left for a multi-GPU run.

## Verification

```bash
cd code/
pip install -r requirements.txt
python -m ppd.lpd.tests.verify_paper      # 30 paper claims, all pass
```

`PAPER_CHECKLIST.md` maps each section of the paper to specific code files/lines.

## Reproducing training

```bash
# Pretrained inputs the code expects
ln -sf <ppd.pth>                          code/checkpoints/ppd.pth                 # DA2
ln -sf <depth_anything_v2_vitl.pth>       code/checkpoints/depth_anything_v2_vitl.pth
# OR for MoGe2:
ln -sf <ppd_moge2.pth>                    code/checkpoints/ppd_moge2.pth
ln -sf <moge2.pt>                         code/checkpoints/moge2.pt

# Hypersim 512² pretrain (DA2)
bash code/train_lpd.sh

# 5-dataset 1024×768 fine-tune
python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml      # DA2
# OR
python code/main.py --cfg_file code/ppd/configs/lpd_run5d_moge2.yaml    # MoGe2

# Inference comparison (PPD vs LPD)
python code/experiments/eval_lpd_vs_ppd.py
```

## Datasets — fetching + extracting

Everything is hosted as **archives** in the dataset repo to keep file counts low and upload bandwidth high.

```bash
hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \
    --local-dir /mnt/sig/datasets

bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives
```

Layout after extraction matches what the LPD configs reference (`/mnt/sig/datasets/train/<scene>/...`, `/mnt/sig/datasets/eval_image/...`).

## Citations

- **Pixel-Perfect Depth** — Xu et al., NeurIPS 2025
- **LiDAR-Perfect Depth** — `paper.tex` in this repo