README: document MoGe2 backbone option
Browse files
README.md
CHANGED
|
@@ -17,23 +17,34 @@ Implementation of *LiDAR-Perfect Depth: Score-Decomposed Diffusion with Kalman-i
|
|
| 17 |
|
| 18 |
| Repo | Contents | Size |
|
| 19 |
|---|---|---|
|
| 20 |
-
| `chenming-wu/LiDAR-Perfect-Depth` (this) | code + 6 LPD checkpoints + inference vis + extraction helper | 12.5 GB |
|
| 21 |
-
| [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) | extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights | ~
|
| 22 |
|
| 23 |
## What's in this repo
|
| 24 |
|
| 25 |
- `code/` β full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders.
|
| 26 |
- `checkpoints/`
|
| 27 |
-
- `e000-s001000.ckpt` β¦ `e004-s005000.ckpt` β per-epoch fine-tuned weights
|
| 28 |
- `last.ckpt` β same as e004
|
| 29 |
-
- 2.0 GB each, weights-only
|
| 30 |
- `inference_vis/` β 8-sample qualitative panels (RGB | GT | PPD | LPD | LPD-variance)
|
| 31 |
- `extract_archives.sh` β extracts the dataset archives back into the layout the code expects
|
| 32 |
|
| 33 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
```
|
| 36 |
-
Backbone: PPD
|
| 37 |
Trainable: 16 M (sparse-prompt encoder + gate)
|
| 38 |
Resolution: 1024 Γ 768
|
| 39 |
Batch: 18 (~133 GB peak on a single H200)
|
|
@@ -46,6 +57,9 @@ epoch 0 β 4 loss: 0.0186 β 0.0177 (-4.8%)
|
|
| 46 |
|
| 47 |
The 5 K steps shown here is a partial-train demo β paper-scale training would push it much further.
|
| 48 |
|
|
|
|
|
|
|
|
|
|
| 49 |
## Verification
|
| 50 |
|
| 51 |
```bash
|
|
@@ -60,14 +74,19 @@ python -m ppd.lpd.tests.verify_paper # 30 paper claims, all pass
|
|
| 60 |
|
| 61 |
```bash
|
| 62 |
# Pretrained inputs the code expects
|
| 63 |
-
ln -sf <ppd.pth> code/checkpoints/ppd.pth
|
| 64 |
ln -sf <depth_anything_v2_vitl.pth> code/checkpoints/depth_anything_v2_vitl.pth
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
-
# Hypersim 512Β² pretrain
|
| 67 |
bash code/train_lpd.sh
|
| 68 |
|
| 69 |
-
# 5-dataset 1024Γ768 fine-tune
|
| 70 |
-
python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml
|
|
|
|
|
|
|
| 71 |
|
| 72 |
# Inference comparison (PPD vs LPD)
|
| 73 |
python code/experiments/eval_lpd_vs_ppd.py
|
|
@@ -78,11 +97,9 @@ python code/experiments/eval_lpd_vs_ppd.py
|
|
| 78 |
Everything is hosted as **archives** in the dataset repo to keep file counts low and upload bandwidth high.
|
| 79 |
|
| 80 |
```bash
|
| 81 |
-
# Fetch the data repo (eval sets stay un-archived; only train/* is archive form)
|
| 82 |
hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \
|
| 83 |
--local-dir /mnt/sig/datasets
|
| 84 |
|
| 85 |
-
# Run the extractor (un-tars/un-zips everything under datasets/train/)
|
| 86 |
bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives
|
| 87 |
```
|
| 88 |
|
|
|
|
| 17 |
|
| 18 |
| Repo | Contents | Size |
|
| 19 |
|---|---|---|
|
| 20 |
+
| `chenming-wu/LiDAR-Perfect-Depth` (this) | code + 6 LPD-DA2 checkpoints + inference vis + extraction helper | 12.5 GB |
|
| 21 |
+
| [`chenming-wu/LiDAR-Perfect-Depth-Datasets`](https://huggingface.co/datasets/chenming-wu/LiDAR-Perfect-Depth-Datasets) | extracted eval sets + training-set archives + PPD/DA-V2/RAFT weights | ~991 GB |
|
| 22 |
|
| 23 |
## What's in this repo
|
| 24 |
|
| 25 |
- `code/` β full LPD codebase. New modules under `ppd/lpd/`; updated configs and adapter loaders.
|
| 26 |
- `checkpoints/`
|
| 27 |
+
- `e000-s001000.ckpt` β¦ `e004-s005000.ckpt` β per-epoch fine-tuned weights (DA2 backbone, 5K steps)
|
| 28 |
- `last.ckpt` β same as e004
|
| 29 |
+
- 2.0 GB each, weights-only
|
| 30 |
- `inference_vis/` β 8-sample qualitative panels (RGB | GT | PPD | LPD | LPD-variance)
|
| 31 |
- `extract_archives.sh` β extracts the dataset archives back into the layout the code expects
|
| 32 |
|
| 33 |
+
## Backbone options
|
| 34 |
+
|
| 35 |
+
PPD (and therefore LPD) support two semantic-prompt backbones β switching is a single config change.
|
| 36 |
+
|
| 37 |
+
| Backbone | Trainable params | PPD ckpt | Config |
|
| 38 |
+
|---|---|---|---|
|
| 39 |
+
| **DA2** (Depth-Anything-V2 ViT-L) | 16.3 M | [`gangweix/Pixel-Perfect-Depth/ppd.pth`](https://huggingface.co/gangweix/Pixel-Perfect-Depth/resolve/main/ppd.pth) | `code/ppd/configs/lpd_run5d_10k.yaml` |
|
| 40 |
+
| **MoGe2** | 16.3 M | [Google Drive](https://drive.google.com/file/d/1tabmcsbRVDKDfmO4KU1vOjurzN-wp0HV/view?usp=sharing) | `code/ppd/configs/lpd_run5d_moge2.yaml` |
|
| 41 |
+
|
| 42 |
+
The official PPD MoGe2 release reports 20-30 % improvement over DA2 on zero-shot benchmarks; the LPD prompt branch is identical for either backbone.
|
| 43 |
+
|
| 44 |
+
## Training run summary (DA2 backbone, 5K steps)
|
| 45 |
|
| 46 |
```
|
| 47 |
+
Backbone: PPD-DA2 β 820 M, frozen
|
| 48 |
Trainable: 16 M (sparse-prompt encoder + gate)
|
| 49 |
Resolution: 1024 Γ 768
|
| 50 |
Batch: 18 (~133 GB peak on a single H200)
|
|
|
|
| 57 |
|
| 58 |
The 5 K steps shown here is a partial-train demo β paper-scale training would push it much further.
|
| 59 |
|
| 60 |
+
For MoGe2 backbone we verified the full forward + backward + checkpointing pipeline works
|
| 61 |
+
(`lpd_run5d_moge2.yaml`); training to convergence is left for a multi-GPU run.
|
| 62 |
+
|
| 63 |
## Verification
|
| 64 |
|
| 65 |
```bash
|
|
|
|
| 74 |
|
| 75 |
```bash
|
| 76 |
# Pretrained inputs the code expects
|
| 77 |
+
ln -sf <ppd.pth> code/checkpoints/ppd.pth # DA2
|
| 78 |
ln -sf <depth_anything_v2_vitl.pth> code/checkpoints/depth_anything_v2_vitl.pth
|
| 79 |
+
# OR for MoGe2:
|
| 80 |
+
ln -sf <ppd_moge2.pth> code/checkpoints/ppd_moge2.pth
|
| 81 |
+
ln -sf <moge2.pt> code/checkpoints/moge2.pt
|
| 82 |
|
| 83 |
+
# Hypersim 512Β² pretrain (DA2)
|
| 84 |
bash code/train_lpd.sh
|
| 85 |
|
| 86 |
+
# 5-dataset 1024Γ768 fine-tune
|
| 87 |
+
python code/main.py --cfg_file code/ppd/configs/lpd_run5d_10k.yaml # DA2
|
| 88 |
+
# OR
|
| 89 |
+
python code/main.py --cfg_file code/ppd/configs/lpd_run5d_moge2.yaml # MoGe2
|
| 90 |
|
| 91 |
# Inference comparison (PPD vs LPD)
|
| 92 |
python code/experiments/eval_lpd_vs_ppd.py
|
|
|
|
| 97 |
Everything is hosted as **archives** in the dataset repo to keep file counts low and upload bandwidth high.
|
| 98 |
|
| 99 |
```bash
|
|
|
|
| 100 |
hf download chenming-wu/LiDAR-Perfect-Depth-Datasets --repo-type dataset \
|
| 101 |
--local-dir /mnt/sig/datasets
|
| 102 |
|
|
|
|
| 103 |
bash code/extract_archives.sh /mnt/sig/datasets /mnt/sig/datasets/archives
|
| 104 |
```
|
| 105 |
|