OzzyGT's picture
OzzyGT HF Staff
intiial commit
b36de4d
---
license: apache-2.0
---
# Depth Anything V2 Estimator Block
A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block for monocular depth estimation using [Depth Anything V2](https://huggingface.co/depth-anything/Depth-Anything-V2-Large-hf). Supports both images and videos.
## Features
- **Relative depth estimation** using Depth Anything V2 (Large variant, 335M params)
- **Image and video** input support
- **Grayscale or turbo colormap** visualization
## Installation
```bash
# Using uv
uv sync
# Using pip
pip install -r requirements.txt
```
## Quick Start
### Load the block
```python
from diffusers import ModularPipelineBlocks
import torch
blocks = ModularPipelineBlocks.from_pretrained(
"your-username/depth-anything-v2-estimator", # or local path "."
trust_remote_code=True,
)
pipeline = blocks.init_pipeline()
pipeline.load_components(torch_dtype=torch.float16)
pipeline.to("cuda")
```
### Single image - grayscale depth
```python
from PIL import Image
image = Image.open("photo.jpg")
output = pipeline(image=image)
# Save depth map
output.depth_image.save("photo_depth.png")
# Access raw relative depth tensor
print(output.predicted_depth.shape) # (H, W)
```
### Single image - turbo colormap
```python
output = pipeline(image=image, colormap="turbo")
output.depth_image.save("photo_depth_turbo.png")
```
### Video - grayscale depth
```python
from block import save_video
output = pipeline(video_path="input.mp4", colormap="grayscale")
save_video(output.depth_frames, output.fps, "output_depth.mp4")
```
### Video - turbo colormap
```python
output = pipeline(video_path="input.mp4", colormap="turbo")
save_video(output.depth_frames, output.fps, "output_depth_turbo.mp4")
```
## Inputs
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `image` | `PIL.Image` | - | Image to estimate depth for |
| `video_path` | `str` | - | Path to input video. When provided, `image` is ignored |
| `colormap` | `str` | `"grayscale"` | `"grayscale"` or `"turbo"` (colormapped) |
## Outputs
### Image mode
| Output | Type | Description |
|--------|------|-------------|
| `depth_image` | `PIL.Image` | Normalized depth visualization |
| `predicted_depth` | `torch.Tensor` | Raw relative depth (H x W) |
### Video mode
| Output | Type | Description |
|--------|------|-------------|
| `depth_frames` | `List[PIL.Image]` | Per-frame depth visualizations |
| `fps` | `float` | Source video frame rate |
## Depth Normalization
Depth values are min-max normalized and inverted so that bright areas represent nearby surfaces and dark areas represent distant ones.
- **Bright = close**, **dark = far** (grayscale)
- **Warm (red/yellow) = close**, **cool (blue) = far** (turbo)
## Model Variants
The block defaults to `depth-anything/Depth-Anything-V2-Large-hf`. Other available variants:
| Variant | Model ID | Params |
|---------|----------|--------|
| Small | `depth-anything/Depth-Anything-V2-Small-hf` | 24.8M |
| Base | `depth-anything/Depth-Anything-V2-Base-hf` | 97.5M |
| **Large** (default) | `depth-anything/Depth-Anything-V2-Large-hf` | 335M |