| --- |
| library_name: diffusers |
| license: apache-2.0 |
| tags: |
| - modular-diffusers |
| - diffusers |
| - depth-estimation |
| --- |
| # Depth Pro Estimator Block |
|
|
| A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block for monocular depth estimation using Apple's [Depth Pro](https://huggingface.co/apple/DepthPro-hf) model. Supports both images and videos. |
|
|
| ## Features |
|
|
| - **Metric depth estimation** in real-world meters using Depth Pro |
| - **Image and video** input support |
| - **Grayscale or turbo colormap** visualization |
| - Inverse depth normalization (following Apple's reference implementation) for robust handling of outdoor/sky scenes |
|
|
| ## Installation |
|
|
| ```bash |
| # Using uv |
| uv sync |
| |
| # Using pip |
| pip install -r requirements.txt |
| ``` |
|
|
| ## Quick Start |
|
|
| ### Load the block |
|
|
| ```python |
| from diffusers import ModularPipelineBlocks |
| import torch |
| |
| blocks = ModularPipelineBlocks.from_pretrained( |
| "your-username/depth-pro-estimator", # or local path "." |
| trust_remote_code=True, |
| ) |
| pipeline = blocks.init_pipeline() |
| pipeline.load_components(torch_dtype=torch.float16) |
| pipeline.to("cuda") |
| ``` |
|
|
| ### Single image - grayscale depth |
|
|
| ```python |
| from PIL import Image |
| |
| image = Image.open("photo.jpg") |
| output = pipeline(image=image) |
| |
| # Save depth map |
| output.depth_image.save("photo_depth.png") |
| |
| # Access raw metric depth tensor (in meters) |
| print(output.predicted_depth.shape) # (H, W) |
| print(output.field_of_view) # estimated FOV |
| print(output.focal_length) # estimated focal length |
| ``` |
|
|
| ### Single image - turbo colormap |
|
|
| ```python |
| output = pipeline(image=image, colormap="turbo") |
| output.depth_image.save("photo_depth_turbo.png") |
| ``` |
|
|
| ### Video - grayscale depth |
|
|
| ```python |
| from block import save_video |
| |
| output = pipeline(video_path="input.mp4", colormap="grayscale") |
| save_video(output.depth_frames, output.fps, "output_depth.mp4") |
| ``` |
|
|
| ### Video - turbo colormap |
|
|
| ```python |
| output = pipeline(video_path="input.mp4", colormap="turbo") |
| save_video(output.depth_frames, output.fps, "output_depth_turbo.mp4") |
| ``` |
|
|
| ## Inputs |
|
|
| | Parameter | Type | Default | Description | |
| |-----------|------|---------|-------------| |
| | `image` | `PIL.Image` | - | Image to estimate depth for | |
| | `video_path` | `str` | - | Path to input video. When provided, `image` is ignored | |
| | `colormap` | `str` | `"grayscale"` | `"grayscale"` or `"turbo"` (colormapped) | |
|
|
| ## Outputs |
|
|
| ### Image mode |
|
|
| | Output | Type | Description | |
| |--------|------|-------------| |
| | `depth_image` | `PIL.Image` | Normalized depth visualization | |
| | `predicted_depth` | `torch.Tensor` | Raw metric depth in meters (H x W) | |
| | `field_of_view` | `float` | Estimated horizontal FOV | |
| | `focal_length` | `float` | Estimated focal length | |
|
|
| ### Video mode |
|
|
| | Output | Type | Description | |
| |--------|------|-------------| |
| | `depth_frames` | `List[PIL.Image]` | Per-frame depth visualizations | |
| | `fps` | `float` | Source video frame rate | |
|
|
| ## Depth Normalization |
|
|
| Depth visualization uses inverse depth clipped to [0.1m, 250m], following [Apple's reference implementation](https://github.com/apple/ml-depth-pro). This prevents sky/infinity values (clamped at 10,000m by the model) from crushing near-field detail into a binary mask. |
|
|
| - **Bright = close**, **dark = far** (grayscale) |
| - **Warm (red/yellow) = close**, **cool (blue) = far** (turbo) |
|
|