| --- |
| license: apache-2.0 |
| --- |
| |
| # Depth Anything V2 Estimator Block |
|
|
| A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block for monocular depth estimation using [Depth Anything V2](https://huggingface.co/depth-anything/Depth-Anything-V2-Large-hf). Supports both images and videos. |
|
|
| ## Features |
|
|
| - **Relative depth estimation** using Depth Anything V2 (Large variant, 335M params) |
| - **Image and video** input support |
| - **Grayscale or turbo colormap** visualization |
|
|
| ## Installation |
|
|
| ```bash |
| # Using uv |
| uv sync |
| |
| # Using pip |
| pip install -r requirements.txt |
| ``` |
|
|
| ## Quick Start |
|
|
| ### Load the block |
|
|
| ```python |
| from diffusers import ModularPipelineBlocks |
| import torch |
| |
| blocks = ModularPipelineBlocks.from_pretrained( |
| "your-username/depth-anything-v2-estimator", # or local path "." |
| trust_remote_code=True, |
| ) |
| pipeline = blocks.init_pipeline() |
| pipeline.load_components(torch_dtype=torch.float16) |
| pipeline.to("cuda") |
| ``` |
|
|
| ### Single image - grayscale depth |
|
|
| ```python |
| from PIL import Image |
| |
| image = Image.open("photo.jpg") |
| output = pipeline(image=image) |
| |
| # Save depth map |
| output.depth_image.save("photo_depth.png") |
| |
| # Access raw relative depth tensor |
| print(output.predicted_depth.shape) # (H, W) |
| ``` |
|
|
| ### Single image - turbo colormap |
|
|
| ```python |
| output = pipeline(image=image, colormap="turbo") |
| output.depth_image.save("photo_depth_turbo.png") |
| ``` |
|
|
| ### Video - grayscale depth |
|
|
| ```python |
| from block import save_video |
| |
| output = pipeline(video_path="input.mp4", colormap="grayscale") |
| save_video(output.depth_frames, output.fps, "output_depth.mp4") |
| ``` |
|
|
| ### Video - turbo colormap |
|
|
| ```python |
| output = pipeline(video_path="input.mp4", colormap="turbo") |
| save_video(output.depth_frames, output.fps, "output_depth_turbo.mp4") |
| ``` |
|
|
| ## Inputs |
|
|
| | Parameter | Type | Default | Description | |
| |-----------|------|---------|-------------| |
| | `image` | `PIL.Image` | - | Image to estimate depth for | |
| | `video_path` | `str` | - | Path to input video. When provided, `image` is ignored | |
| | `colormap` | `str` | `"grayscale"` | `"grayscale"` or `"turbo"` (colormapped) | |
|
|
| ## Outputs |
|
|
| ### Image mode |
|
|
| | Output | Type | Description | |
| |--------|------|-------------| |
| | `depth_image` | `PIL.Image` | Normalized depth visualization | |
| | `predicted_depth` | `torch.Tensor` | Raw relative depth (H x W) | |
|
|
| ### Video mode |
|
|
| | Output | Type | Description | |
| |--------|------|-------------| |
| | `depth_frames` | `List[PIL.Image]` | Per-frame depth visualizations | |
| | `fps` | `float` | Source video frame rate | |
|
|
| ## Depth Normalization |
|
|
| Depth values are min-max normalized and inverted so that bright areas represent nearby surfaces and dark areas represent distant ones. |
|
|
| - **Bright = close**, **dark = far** (grayscale) |
| - **Warm (red/yellow) = close**, **cool (blue) = far** (turbo) |
|
|
| ## Model Variants |
|
|
| The block defaults to `depth-anything/Depth-Anything-V2-Large-hf`. Other available variants: |
|
|
| | Variant | Model ID | Params | |
| |---------|----------|--------| |
| | Small | `depth-anything/Depth-Anything-V2-Small-hf` | 24.8M | |
| | Base | `depth-anything/Depth-Anything-V2-Base-hf` | 97.5M | |
| | **Large** (default) | `depth-anything/Depth-Anything-V2-Large-hf` | 335M | |
|
|