File size: 8,053 Bytes
789c9b1
9c05556
789c9b1
 
 
 
e7682cd
789c9b1
 
 
 
9c05556
a7087a4
e7682cd
 
 
 
a7087a4
9c05556
e7682cd
 
 
 
a7087a4
9c05556
 
e7682cd
 
9c05556
a7087a4
9c05556
e7682cd
9c05556
 
a7087a4
 
 
 
9c05556
 
 
 
 
 
 
 
 
e7682cd
 
 
 
 
9c05556
 
e7682cd
9c05556
e7682cd
9c05556
e7682cd
 
9c05556
 
 
 
 
e7682cd
a7087a4
 
 
9c05556
e7682cd
 
 
a7087a4
e7682cd
 
 
 
 
 
 
 
 
 
 
a7087a4
 
9c05556
e7682cd
 
 
9c05556
 
a7087a4
9c05556
 
 
e7682cd
9c05556
a7087a4
9c05556
 
 
 
a7087a4
 
9c05556
a7087a4
 
9c05556
 
 
 
e7682cd
a7087a4
 
 
 
9c05556
 
a7087a4
 
 
e7682cd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
title: Depth Estimation Compare Demo
emoji: πŸ‘€
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: false
---

# Depth Estimation Comparison Demo

A Gradio interface for comparing **Depth Anything v1**, **Depth Anything v2**, **Depth Anything v3 (AnySize)**, and **Pixel-Perfect Depth (PPD)** on the same image. Switch between side-by-side layouts, a slider overlay, single-model inspection, or a dedicated v3 tab to understand how different pipelines perceive scene geometry. Two entrypoints are provided:

- `app_local.py` – full-featured local runner with minimal memory constraints.
- `app.py` – ZeroGPU-aware build tuned for HuggingFace Spaces with aggressive cache management.

## πŸš€ Highlights
- **Four interactive experiences**: draggable slider, labeled side-by-side comparison, original-vs-depth slider, and a Depth Anything v3 tab with RGB vs depth visualization + metadata.
- **Multi-family depth models**: run ViT variants from Depth Anything v1/v2/v3 alongside Pixel-Perfect Depth with MoGe metric alignment.
- **ZeroGPU aware**: `app.py` performs on-demand loading, cache clearing, and CUDA cleanup to stay within HuggingFace Spaces limits, while `app_local.py` keeps models warm for faster iteration.
- **Curated examples**: reusable demo images sourced from each model family (`assets/examples`, `Depth-Anything*/assets/examples`, `Depth-Anything-3-anysize/assets/examples`, `Pixel-Perfect-Depth/assets/examples`).

## πŸ” Supported Pipelines
- **Depth Anything v1** (`LiheYoung/depth_anything_*`): ViT-S/B/L with fast transformer backbones and colorized outputs via `Spectral_r` colormap.
- **Depth Anything v2** (`Depth-Anything-V2/checkpoints/*.pth` or HF Hub mirrors): ViT-Small/Base/Large with configurable feature channels and improved edge handling.
- **Depth Anything v3 (AnySize)** (`depth-anything/DA3*` via bundled AnySize fork): Nested, giant, large, base, small, mono, and metric variants with native-resolution inference and automatic padding/cropping.
- **Pixel-Perfect Depth**: Diffusion-based relative depth refined by the **MoGe** metric surface model and RANSAC alignment to recover metric depth; customizable denoising steps.

## πŸ–₯️ App Experience
- **Slider Comparison**: drag between any two predictions with automatically labeled overlays.
- **Method Comparison**: view models side-by-side with synchronized layout and captions rendered in OpenCV.
- **Single Model**: inspect the RGB input versus one model output using the Gradio `ImageSlider` component.

## πŸ“¦ Installation & Setup

### Local Development
1. **Clone & enter**:
   ```bash
   git clone <repository-url>
   cd Depth-Estimation-Compare-demo
   ```
2. **Install dependencies** (includes `gradio`, `torch`, `gradio_imageslider`, `open3d`, `scikit-learn`, and MoGe utilities):
   ```bash
   pip install -r requirements.txt
   ```
3. **Install the AnySize fork** (required for Depth Anything v3 tab):
   ```bash
   pip install -e Depth-Anything-3-anysize/.[all]
   ```
4. **Model assets**:
   - Depth Anything v1 checkpoints stream automatically from the HuggingFace Hub.
   - Download Depth Anything v2 weights into `Depth-Anything-V2/checkpoints/` if they are not already present (`depth_anything_v2_vits.pth`, `depth_anything_v2_vitb.pth`, `depth_anything_v2_vitl.pth`).
   - Depth Anything v3 models download via the bundled AnySize API from `depth-anything/*` repositories at inference time; no manual checkpoints required.
   - Pixel-Perfect Depth pulls the diffusion checkpoint (`ppd.pth`) from `gangweix/Pixel-Perfect-Depth` on first use and loads MoGe weights (`Ruicheng/moge-2-vitl-normal`).
5. **Run the app**:
   ```bash
   python app_local.py   # Local UI with v3 tab and warm caches
   python app.py         # ZeroGPU-ready launch script (loads models on demand)
   ```

### HuggingFace Spaces (ZeroGPU)
1. Push the repository contents to a Gradio Space.
2. Select the **ZeroGPU** hardware preset.
3. The app downloads required checkpoints (Depth Anything v1/v2/v3, PPD, MoGe) on demand and aggressively frees memory via `clear_model_cache()` between requests.

## πŸ“ Project Structure
```
Depth-Estimation-Compare-demo/
β”œβ”€β”€ app.py                        # ZeroGPU deployment entrypoint (includes v3 tab)
β”œβ”€β”€ app_local.py                  # Local-friendly launch script (full feature set)
β”œβ”€β”€ requirements.txt              # Python dependencies (Gradio, Torch, PPD stack)
β”œβ”€β”€ assets/
β”‚   └── examples/                 # Shared demo imagery
β”œβ”€β”€ Depth-Anything/               # Depth Anything v1 implementation + utilities
β”œβ”€β”€ Depth-Anything-V2/            # Depth Anything v2 implementation & checkpoints
β”œβ”€β”€ Depth-Anything-3-anysize/     # Bundled AnySize fork powering Depth Anything v3 tab
β”‚   β”œβ”€β”€ app.py                    # Standalone AnySize Gradio demo (optional)
β”‚   β”œβ”€β”€ depth3_anysize.py         # Scripted inference example
β”‚   β”œβ”€β”€ pyproject.toml            # Editable install metadata
β”‚   β”œβ”€β”€ requirements.txt          # AnySize-specific dependencies
β”‚   └── src/depth_anything_3/     # AnySize API, configs, and model code
β”œβ”€β”€ Pixel-Perfect-Depth/          # Pixel-Perfect Depth diffusion + MoGe helpers
└── README.md                     # You are here
```

## βš™οΈ Configuration Notes
- Model dropdown labels come from `V1_MODEL_CONFIGS`, `V2_MODEL_CONFIGS`, and `DA3_MODEL_SOURCES` plus the PPD entry in both apps.
- `clear_model_cache()` resets every model family (v1/v2/v3/PPD) and flushes CUDA to respect ZeroGPU constraints in `app.py`.
- Depth Anything v3 inference leverages the AnySize API (`process_res=None`, `process_res_method="keep"`) to preserve native resolution and returns processed RGB/depth pairs.
- Pixel-Perfect Depth inference aligns relative depth to metric scale through `recover_metric_depth_ransac()` for consistent visualization.
- Depth visualizations use a normalized `Spectral_r` colormap; PPD uses a dedicated matplotlib colormap for metric maps.

## πŸ“Š Performance Expectations
- **Depth Anything v1**: ViT-S ~1–2 s, ViT-B ~2–4 s, ViT-L ~4–8 s (image dependent).
- **Depth Anything v2**: similar to v1 with improved sharpness; HF downloads add one-time startup overhead.
- **Depth Anything v3**: nested/giant models are heavier (expect longer cold starts), while base/small options are close to v2 latency when running at native resolution.
- **Pixel-Perfect Depth**: diffusion + metric refinement typically takes longer (10–20 denoise steps) but returns metrically-aligned depth suitable for downstream 3D tasks.

## 🎯 Usage Tips
- Mix-and-match any two models in comparison tabs to highlight qualitative differences.
- Use the Single Model tab to corroborate PPD metric depth versus RGB input.
- Leverage the provided examples to benchmark indoor/outdoor, lighting extremes, and complex geometry scenarios before running custom images.

## 🀝 Contributing
Enhancements are welcomeβ€”new model backends, visualization modes, or memory optimizations are especially valuable for ZeroGPU deployments. Please follow the coding style in `app.py` and keep documentation in sync with new capabilities.

## πŸ“š References
- [Depth Anything v1](https://github.com/LiheYoung/Depth-Anything)
- [Depth Anything v2](https://github.com/DepthAnything/Depth-Anything-V2)
- [Pixel-Perfect Depth](https://github.com/gangweix/pixel-perfect-depth)
- [MoGe](https://huggingface.co/Ruicheng/moge-2-vitl-normal)
- [Depth Anything 3 AnySize Fork](https://github.com/ByteDance-Seed/Depth-Anything-3) (see bundled `Depth-Anything-3-anysize` directory)

## πŸ“„ License
- Depth Anything v1: MIT License
- Depth Anything v2: Apache 2.0 License
- Pixel-Perfect Depth: see upstream repository for licensing
- Demo scaffolding in this repo: MIT License (follow individual component terms)

---

Built as a hands-on playground for exploring modern monocular depth estimators. Adjust tabs, compare outputs, and plug results into your 3D workflows.