chenming-wu
/

LiDAR-Perfect-Depth

@@ -17,23 +17,34 @@ Implementation of *LiDAR-Perfect Depth: Score-Decomposed Diffusion with Kalman-i
 | Repo | Contents | Size |
 |---|---|---|
-| `chenming-wu/LiDAR-Perfect-Depth` (this) | code + 6 LPD checkpoints + inference vis + extraction helper | 12.5 GB |
-| [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) | extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights | ~993 GB |
 ## What's in this repo
 - `code/` — full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders.
 - `checkpoints/`
-  - `e000-s001000.ckpt` … `e004-s005000.ckpt` — per-epoch fine-tuned weights
   - `last.ckpt` — same as e004
-  - 2.0 GB each, weights-only; load with `pipeline.dit.load_state_dict(strip_prefix("pipeline.", ckpt["state_dict"]), strict=False)`
 - `inference_vis/` — 8-sample qualitative panels (RGB | GT | PPD | LPD | LPD-variance)
 - `extract_archives.sh` — extracts the dataset archives back into the layout the code expects
-## Training run summary
 ```
-Backbone:   PPD (DA-V2 ViT-L semantics) — 820 M, frozen
 Trainable:  16 M (sparse-prompt encoder + gate)
 Resolution: 1024 × 768
 Batch:      18  (~133 GB peak on a single H200)
@@ -46,6 +57,9 @@ epoch 0 → 4   loss: 0.0186 → 0.0177  (-4.8%)
 The 5 K steps shown here is a partial-train demo — paper-scale training would push it much further.
 ## Verification
 ```bash
@@ -60,14 +74,19 @@ python -m ppd.lpd.tests.verify_paper      # 30 paper claims, all pass
 ```bash
 # Pretrained inputs the code expects
-ln -sf <ppd.pth>                          code/checkpoints/ppd.pth
 ln -sf <depth_anything_v2_vitl.pth>       code/checkpoints/depth_anything_v2_vitl.pth
-# Hypersim 512² pretrain
 bash code/train_lpd.sh
-# 5-dataset 1024×768 fine-tune (this run's recipe)
-python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml
 # Inference comparison (PPD vs LPD)
 python code/experiments/eval_lpd_vs_ppd.py
@@ -78,11 +97,9 @@ python code/experiments/eval_lpd_vs_ppd.py
 Everything is hosted as **archives** in the dataset repo to keep file counts low and upload bandwidth high.
 ```bash
-# Fetch the data repo (eval sets stay un-archived; only train/* is archive form)
 hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \
     --local-dir /mnt/sig/datasets
-# Run the extractor (un-tars/un-zips everything under datasets/train/)
 bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives
 ```

 | Repo | Contents | Size |
 |---|---|---|
+| `chenming-wu/LiDAR-Perfect-Depth` (this) | code + 6 LPD-DA2 checkpoints + inference vis + extraction helper | 12.5 GB |
+| [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) | extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights | ~991 GB |
 ## What's in this repo
 - `code/` — full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders.
 - `checkpoints/`
+  - `e000-s001000.ckpt` … `e004-s005000.ckpt` — per-epoch fine-tuned weights (DA2 backbone, 5K steps)
   - `last.ckpt` — same as e004
+  - 2.0 GB each, weights-only
 - `inference_vis/` — 8-sample qualitative panels (RGB | GT | PPD | LPD | LPD-variance)
 - `extract_archives.sh` — extracts the dataset archives back into the layout the code expects
+## Backbone options
+PPD (and therefore LPD) support two semantic-prompt backbones — switching is a single config change.
+| Backbone | Trainable params | PPD ckpt | Config |
+|---|---|---|---|
+| **DA2** (Depth-Anything-V2 ViT-L) | 16.3 M | [`gangweix/Pixel-Perfect-Depth/ppd.pth`](https://huggingface.co/gangweix/Pixel-Perfect-Depth/resolve/main/ppd.pth) | `code/ppd/configs/lpd_run5d_10k.yaml` |
+| **MoGe2** | 16.3 M | [Google Drive](https://drive.google.com/file/d/1tabmcsbRVDKDfmO4KU1vOjurzN-wp0HV/view?usp=sharing) | `code/ppd/configs/lpd_run5d_moge2.yaml` |
+The official PPD MoGe2 release reports 20-30 % improvement over DA2 on zero-shot benchmarks; the LPD prompt branch is identical for either backbone.
+## Training run summary (DA2 backbone, 5K steps)
 ```
+Backbone:   PPD-DA2 — 820 M, frozen
 Trainable:  16 M (sparse-prompt encoder + gate)
 Resolution: 1024 × 768
 Batch:      18  (~133 GB peak on a single H200)
 The 5 K steps shown here is a partial-train demo — paper-scale training would push it much further.
+For MoGe2 backbone we verified the full forward + backward + checkpointing pipeline works
+(`lpd_run5d_moge2.yaml`); training to convergence is left for a multi-GPU run.
 ## Verification
 ```bash
 ```bash
 # Pretrained inputs the code expects
+ln -sf <ppd.pth>                          code/checkpoints/ppd.pth                 # DA2
 ln -sf <depth_anything_v2_vitl.pth>       code/checkpoints/depth_anything_v2_vitl.pth
+# OR for MoGe2:
+ln -sf <ppd_moge2.pth>                    code/checkpoints/ppd_moge2.pth
+ln -sf <moge2.pt>                         code/checkpoints/moge2.pt
+# Hypersim 512² pretrain (DA2)
 bash code/train_lpd.sh
+# 5-dataset 1024×768 fine-tune
+python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml      # DA2
+# OR
+python code/main.py --cfg_file code/ppd/configs/lpd_run5d_moge2.yaml    # MoGe2
 # Inference comparison (PPD vs LPD)
 python code/experiments/eval_lpd_vs_ppd.py
 Everything is hosted as **archives** in the dataset repo to keep file counts low and upload bandwidth high.
 ```bash
 hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \
     --local-dir /mnt/sig/datasets
 bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives
 ```