shriarul5273's picture
Added Depth-Anything-3 models
e7682cd
---
title: Depth Estimation Compare Demo
emoji: πŸ‘€
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: false
---
# Depth Estimation Comparison Demo
A Gradio interface for comparing **Depth Anything v1**, **Depth Anything v2**, **Depth Anything v3 (AnySize)**, and **Pixel-Perfect Depth (PPD)** on the same image. Switch between side-by-side layouts, a slider overlay, single-model inspection, or a dedicated v3 tab to understand how different pipelines perceive scene geometry. Two entrypoints are provided:
- `app_local.py` – full-featured local runner with minimal memory constraints.
- `app.py` – ZeroGPU-aware build tuned for HuggingFace Spaces with aggressive cache management.
## πŸš€ Highlights
- **Four interactive experiences**: draggable slider, labeled side-by-side comparison, original-vs-depth slider, and a Depth Anything v3 tab with RGB vs depth visualization + metadata.
- **Multi-family depth models**: run ViT variants from Depth Anything v1/v2/v3 alongside Pixel-Perfect Depth with MoGe metric alignment.
- **ZeroGPU aware**: `app.py` performs on-demand loading, cache clearing, and CUDA cleanup to stay within HuggingFace Spaces limits, while `app_local.py` keeps models warm for faster iteration.
- **Curated examples**: reusable demo images sourced from each model family (`assets/examples`, `Depth-Anything*/assets/examples`, `Depth-Anything-3-anysize/assets/examples`, `Pixel-Perfect-Depth/assets/examples`).
## πŸ” Supported Pipelines
- **Depth Anything v1** (`LiheYoung/depth_anything_*`): ViT-S/B/L with fast transformer backbones and colorized outputs via `Spectral_r` colormap.
- **Depth Anything v2** (`Depth-Anything-V2/checkpoints/*.pth` or HF Hub mirrors): ViT-Small/Base/Large with configurable feature channels and improved edge handling.
- **Depth Anything v3 (AnySize)** (`depth-anything/DA3*` via bundled AnySize fork): Nested, giant, large, base, small, mono, and metric variants with native-resolution inference and automatic padding/cropping.
- **Pixel-Perfect Depth**: Diffusion-based relative depth refined by the **MoGe** metric surface model and RANSAC alignment to recover metric depth; customizable denoising steps.
## πŸ–₯️ App Experience
- **Slider Comparison**: drag between any two predictions with automatically labeled overlays.
- **Method Comparison**: view models side-by-side with synchronized layout and captions rendered in OpenCV.
- **Single Model**: inspect the RGB input versus one model output using the Gradio `ImageSlider` component.
## πŸ“¦ Installation & Setup
### Local Development
1. **Clone & enter**:
```bash
git clone <repository-url>
cd Depth-Estimation-Compare-demo
```
2. **Install dependencies** (includes `gradio`, `torch`, `gradio_imageslider`, `open3d`, `scikit-learn`, and MoGe utilities):
```bash
pip install -r requirements.txt
```
3. **Install the AnySize fork** (required for Depth Anything v3 tab):
```bash
pip install -e Depth-Anything-3-anysize/.[all]
```
4. **Model assets**:
- Depth Anything v1 checkpoints stream automatically from the HuggingFace Hub.
- Download Depth Anything v2 weights into `Depth-Anything-V2/checkpoints/` if they are not already present (`depth_anything_v2_vits.pth`, `depth_anything_v2_vitb.pth`, `depth_anything_v2_vitl.pth`).
- Depth Anything v3 models download via the bundled AnySize API from `depth-anything/*` repositories at inference time; no manual checkpoints required.
- Pixel-Perfect Depth pulls the diffusion checkpoint (`ppd.pth`) from `gangweix/Pixel-Perfect-Depth` on first use and loads MoGe weights (`Ruicheng/moge-2-vitl-normal`).
5. **Run the app**:
```bash
python app_local.py # Local UI with v3 tab and warm caches
python app.py # ZeroGPU-ready launch script (loads models on demand)
```
### HuggingFace Spaces (ZeroGPU)
1. Push the repository contents to a Gradio Space.
2. Select the **ZeroGPU** hardware preset.
3. The app downloads required checkpoints (Depth Anything v1/v2/v3, PPD, MoGe) on demand and aggressively frees memory via `clear_model_cache()` between requests.
## πŸ“ Project Structure
```
Depth-Estimation-Compare-demo/
β”œβ”€β”€ app.py # ZeroGPU deployment entrypoint (includes v3 tab)
β”œβ”€β”€ app_local.py # Local-friendly launch script (full feature set)
β”œβ”€β”€ requirements.txt # Python dependencies (Gradio, Torch, PPD stack)
β”œβ”€β”€ assets/
β”‚ └── examples/ # Shared demo imagery
β”œβ”€β”€ Depth-Anything/ # Depth Anything v1 implementation + utilities
β”œβ”€β”€ Depth-Anything-V2/ # Depth Anything v2 implementation & checkpoints
β”œβ”€β”€ Depth-Anything-3-anysize/ # Bundled AnySize fork powering Depth Anything v3 tab
β”‚ β”œβ”€β”€ app.py # Standalone AnySize Gradio demo (optional)
β”‚ β”œβ”€β”€ depth3_anysize.py # Scripted inference example
β”‚ β”œβ”€β”€ pyproject.toml # Editable install metadata
β”‚ β”œβ”€β”€ requirements.txt # AnySize-specific dependencies
β”‚ └── src/depth_anything_3/ # AnySize API, configs, and model code
β”œβ”€β”€ Pixel-Perfect-Depth/ # Pixel-Perfect Depth diffusion + MoGe helpers
└── README.md # You are here
```
## βš™οΈ Configuration Notes
- Model dropdown labels come from `V1_MODEL_CONFIGS`, `V2_MODEL_CONFIGS`, and `DA3_MODEL_SOURCES` plus the PPD entry in both apps.
- `clear_model_cache()` resets every model family (v1/v2/v3/PPD) and flushes CUDA to respect ZeroGPU constraints in `app.py`.
- Depth Anything v3 inference leverages the AnySize API (`process_res=None`, `process_res_method="keep"`) to preserve native resolution and returns processed RGB/depth pairs.
- Pixel-Perfect Depth inference aligns relative depth to metric scale through `recover_metric_depth_ransac()` for consistent visualization.
- Depth visualizations use a normalized `Spectral_r` colormap; PPD uses a dedicated matplotlib colormap for metric maps.
## πŸ“Š Performance Expectations
- **Depth Anything v1**: ViT-S ~1–2 s, ViT-B ~2–4 s, ViT-L ~4–8 s (image dependent).
- **Depth Anything v2**: similar to v1 with improved sharpness; HF downloads add one-time startup overhead.
- **Depth Anything v3**: nested/giant models are heavier (expect longer cold starts), while base/small options are close to v2 latency when running at native resolution.
- **Pixel-Perfect Depth**: diffusion + metric refinement typically takes longer (10–20 denoise steps) but returns metrically-aligned depth suitable for downstream 3D tasks.
## 🎯 Usage Tips
- Mix-and-match any two models in comparison tabs to highlight qualitative differences.
- Use the Single Model tab to corroborate PPD metric depth versus RGB input.
- Leverage the provided examples to benchmark indoor/outdoor, lighting extremes, and complex geometry scenarios before running custom images.
## 🀝 Contributing
Enhancements are welcomeβ€”new model backends, visualization modes, or memory optimizations are especially valuable for ZeroGPU deployments. Please follow the coding style in `app.py` and keep documentation in sync with new capabilities.
## πŸ“š References
- [Depth Anything v1](https://github.com/LiheYoung/Depth-Anything)
- [Depth Anything v2](https://github.com/DepthAnything/Depth-Anything-V2)
- [Pixel-Perfect Depth](https://github.com/gangweix/pixel-perfect-depth)
- [MoGe](https://huggingface.co/Ruicheng/moge-2-vitl-normal)
- [Depth Anything 3 AnySize Fork](https://github.com/ByteDance-Seed/Depth-Anything-3) (see bundled `Depth-Anything-3-anysize` directory)
## πŸ“„ License
- Depth Anything v1: MIT License
- Depth Anything v2: Apache 2.0 License
- Pixel-Perfect Depth: see upstream repository for licensing
- Demo scaffolding in this repo: MIT License (follow individual component terms)
---
Built as a hands-on playground for exploring modern monocular depth estimators. Adjust tabs, compare outputs, and plug results into your 3D workflows.