LiDAR-Perfect-Depth / code /PAPER_CHECKLIST.md
chenming-wu's picture
code
436b829 verified

Paper → Code traceability matrix

Per-claim audit of paper.tex against the implementation. The verifier python -m ppd.lpd.tests.verify_paper runs the equations on small tensors and confirms each cell below — 30/30 currently pass.

§3.1 Image Model

Paper claim Code
Sparse-prompt encoder pools at scales {4, 8, 16, 32} ppd/lpd/prompt_encoder.py:101 (scales=(4,8,16,32) default)
Depth + density channel per scale prompt_encoder.py:23-29 (masked_avg_pool)
Two-layer CNN + linear projection prompt_encoder.py:53-58 (_SmallCNN) + prompt_encoder.py:111 (self.fuse)
Density ρ as per-token confidence prompt_encoder.py:131 (returned tuple)
Eq. (1) s_joint = s_sem + g(p,ρ,t) ⊙ m(s_sem,p,ρ,t) prompt_gate.py:67-78
Mixer + sigmoid gate, both zero-init prompt_gate.py:32-49 (_zero_linear, gate ends with Sigmoid)
Timestep embedding projected before gating lpd_dit.py:97 (t = self.t_embedder(timestep))
Sparse-prompt log-quantile normalization (2/98 %) prompt_encoder.py:32-50 (quantile_log_normalize)
Prompt fusion happens at the DiT midpoint lpd_dit.py:107-145 (insertion right after PPD's semantics fusion)

§3.2 Video Model

Paper claim Code
Sparse-LiDAR prompt tokens use the same noise-level-conditioned gating reuses LPDDiT per frame (lpd_video.py:81)
RGB + sparse + semantic tokens enter together lpd_video.py:run_video calls pipeline.forward_test(frame)
Temporal positional embeddings on prompt tokens deviation: we do not stack frames into a single video DiT; instead the image DiT runs frame-by-frame with the temporal Kalman filter threading state. Functionally equivalent for the paper's main claims because §3.7 says "all temporal mechanisms are inference-time and require no additional training". A multi-frame video DiT extension would be analogous to lpd_dit.py but starting from ppd/models/dit_video.py.

§3.3 Score Decomposition

Paper claim Code
Eq. (3) factorization `p(x I,y,x_{1:t-1}) ∝ p(x
Eq. (4) score decomposition (3 additive terms) posterior_projection.py:31-42
Eq. (5) LiDAR likelihood -M⊙(x-y)/R posterior_projection.py:35
Eq. (6) Kalman temporal prior -(x-μ)/P posterior_projection.py:39
Eq. (7) projection step + η_τ = α·σ_τ² posterior_projection.py:28, 42
Image model: Term 3 from within-denoising state lpd_train.py:268-302 (KIL sampler bridges image inference)

§3.4 Kalman-in-the-Loop Denoising (Algorithm 1)

Step Code
init μ_0 ← μ_temporal, P_0 ← P_temporal kalman_in_loop.py:48-58
K_τ = P/(P + σ_τ²) kalman_in_loop.py:69
μ_τ = μ + K(x̂_0 - μ) kalman_in_loop.py:70
P_τ = (1 - K) P_{τ-1} kalman_in_loop.py:71
Euler diffusion step kalman_in_loop.py:79-86
Posterior projection (Eq. 7) on x_{τ-1} kalman_in_loop.py:89-98
Returns (x_0, P_final) kalman_in_loop.py:100
Property (iii): variance monotonically non-increasing verified by verify_paper.py — assertion in the test

§3.5 Per-Pixel Temporal Kalman

Paper claim Code
Per-pixel state = (log-depth, variance) temporal_kalman.py:62-65
Predict: x_k^- = warp(x_{k-1}^+, f) temporal_kalman.py:69 (_backward_warp)
P_k^- = warp(P_{k-1}^+) + Q_k temporal_kalman.py:84-86
Eq. (9) flow consistency ε = ‖p + f_fwd + f_bwd(p + f_fwd)‖ temporal_kalman.py:32-37
Q_k(p) = Q_base + α·ε² temporal_kalman.py:88-89
Occlusion: ε > τ_occ ⇒ P ← P_max temporal_kalman.py:91-92
Update at observed: K = P/(P+R), x⁺ = x⁻ + K(y-x⁻), P⁺ = (1-K)P⁻ temporal_kalman.py:104-107
At unobserved: state passes through temporal_kalman.py:104 (mask multiplies the update)
Metric uncertainty exp(√P) - 1 temporal_kalman.py:117-122

§3.6 Uncertainty-Guided Prompt Modulation

Paper claim Code
Eq. (8) ρ̃(p) = ρ(p)·(1 + P(p)/max P) uncertainty_modulation.py:36-49
No new parameters confirmed — only an element-wise op
Plumbed through to gate via density lpd_dit.py:115-117 calls modulate_density(...) if kalman_variance is supplied

§3.7 Training Objective

Paper claim Code
(i) Diffusion velocity-MSE loss lpd_train.py:230-242
(ii) Anchor loss L1(x̂_0 - y) over M losses.py:18-24 + lpd_train.py:245-247
(iii) Multi-scale gradient loss reuses ppd/models/loss.py:multi_scale_grad_loss (lpd_train.py:251-255)
L = L_MSE + λ_a L_anchor + λ_g L_grad lpd_train.py:241, 247, 255
All temporal mechanisms are inference-time only KIL sampler / temporal Kalman / projection / modulation all live outside forward_train
Trainable: only prompt encoder + gate (paper "<1 %") lpd_dit.py:freeze_backbone() — measured 16 M / 820 M ≈ 2 %. The gap vs paper is because we use a 4-scale 1024-dim prompt encoder; shrinking prompt_hidden (128→32) and dropping a scale brings it under 1 %. See "Notes vs paper" in LPD_README.md.

§4.1 Datasets & sparse-LiDAR simulation

Paper claim Code
Train on Hypersim + UrbanSyn + UnrealStereo4K + VKITTI 2 + TartanAir ppd/configs/lpd_finetune.yaml:5-55
Sparse-LiDAR simulated from dense GT lpd_train.py:155-176 calls sparse_simulator.simulate
Patterns: random / scan-line / grid / hybrid sparse_simulator.py:20-78 (one routine each)

§4.4 Implementation details

Paper claim Code
Wiener-filter projection R_proj = 0.1 kalman_in_loop.py:KalmanInLoopConfig.R_proj = 0.1
Temporal Kalman: R = 0.01, Q_base = 0.005, α = 0.5, P_max = 10, τ_occ = 2.0 temporal_kalman.py:TemporalKalmanConfig defaults
P_init = 1.0 same
Smart partial loading from PPD checkpoint for expanded prompt layers lpd_train.py:_load_ppd_weights (strict=False)

Pieces not implemented (deliberate scope cuts)

These are listed in the paper but are evaluation-only or large infrastructure that doesn't change the model itself:

  • §6.3 uncertainty calibration metrics (ECE / AUSE / NLL / reliability diagrams)
  • §6.4 ablations like Farneback ↔ DIS flow, heuristic confidence decay baseline, cosine-ramp projection schedule baseline
  • Temporal warp L_1 video metric
  • dit_video.py parallel extension (we use frame-by-frame inference; see §3.2 deviation note)

These can be added without touching any of the modules above.

How to re-verify after edits

cd /mnt/sig/pixel-perfect-depth
python -m ppd.lpd.tests.verify_paper

A failure prints the specific paper claim and the assertion that broke, making it easy to diagnose drift between paper and code.