OzzyGT
/

depth_pro_custom_block

Depth Estimation

modular-diffusers

Model card Files Files and versions

depth_pro_custom_block / README.md

OzzyGT's picture

OzzyGT HF Staff

initial commit

88c28fc about 1 month ago

|

history blame contribute delete

3.32 kB

	---
	library_name: diffusers
	license: apache-2.0
	tags:
	- modular-diffusers
	- diffusers
	- depth-estimation
	---
	# Depth Pro Estimator Block

	A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block for monocular depth estimation using Apple's [Depth Pro](https://huggingface.co/apple/DepthPro-hf) model. Supports both images and videos.

	## Features

	- Metric depth estimation in real-world meters using Depth Pro
	- Image and video input support
	- Grayscale or turbo colormap visualization
	- Inverse depth normalization (following Apple's reference implementation) for robust handling of outdoor/sky scenes

	## Installation

	```bash
	# Using uv
	uv sync

	# Using pip
	pip install -r requirements.txt
	```

	## Quick Start

	### Load the block

	```python
	from diffusers import ModularPipelineBlocks
	import torch

	blocks = ModularPipelineBlocks.from_pretrained(
	"your-username/depth-pro-estimator", # or local path "."
	trust_remote_code=True,
	)
	pipeline = blocks.init_pipeline()
	pipeline.load_components(torch_dtype=torch.float16)
	pipeline.to("cuda")
	```

	### Single image - grayscale depth

	```python
	from PIL import Image

	image = Image.open("photo.jpg")
	output = pipeline(image=image)

	# Save depth map
	output.depth_image.save("photo_depth.png")

	# Access raw metric depth tensor (in meters)
	print(output.predicted_depth.shape) # (H, W)
	print(output.field_of_view) # estimated FOV
	print(output.focal_length) # estimated focal length
	```

	### Single image - turbo colormap

	```python
	output = pipeline(image=image, colormap="turbo")
	output.depth_image.save("photo_depth_turbo.png")
	```

	### Video - grayscale depth

	```python
	from block import save_video

	output = pipeline(video_path="input.mp4", colormap="grayscale")
	save_video(output.depth_frames, output.fps, "output_depth.mp4")
	```

	### Video - turbo colormap

	```python
	output = pipeline(video_path="input.mp4", colormap="turbo")
	save_video(output.depth_frames, output.fps, "output_depth_turbo.mp4")
	```

	## Inputs

	\| Parameter \| Type \| Default \| Description \|
	\|-----------\|------\|---------\|-------------\|
	\| `image` \| `PIL.Image` \| - \| Image to estimate depth for \|
	\| `video_path` \| `str` \| - \| Path to input video. When provided, `image` is ignored \|
	\| `colormap` \| `str` \| `"grayscale"` \| `"grayscale"` or `"turbo"` (colormapped) \|

	## Outputs

	### Image mode

	\| Output \| Type \| Description \|
	\|--------\|------\|-------------\|
	\| `depth_image` \| `PIL.Image` \| Normalized depth visualization \|
	\| `predicted_depth` \| `torch.Tensor` \| Raw metric depth in meters (H x W) \|
	\| `field_of_view` \| `float` \| Estimated horizontal FOV \|
	\| `focal_length` \| `float` \| Estimated focal length \|

	### Video mode

	\| Output \| Type \| Description \|
	\|--------\|------\|-------------\|
	\| `depth_frames` \| `List[PIL.Image]` \| Per-frame depth visualizations \|
	\| `fps` \| `float` \| Source video frame rate \|

	## Depth Normalization

	Depth visualization uses inverse depth clipped to [0.1m, 250m], following [Apple's reference implementation](https://github.com/apple/ml-depth-pro). This prevents sky/infinity values (clamped at 10,000m by the model) from crushing near-field detail into a binary mask.

	- Bright = close, dark = far (grayscale)
	- Warm (red/yellow) = close, cool (blue) = far (turbo)