OzzyGT
/

depth_anything_custom_block

Model card Files Files and versions

depth_anything_custom_block / README.md

OzzyGT's picture

OzzyGT HF Staff

intiial commit

b36de4d about 1 month ago

|

history blame contribute delete

3.18 kB

	---
	license: apache-2.0
	---

	# Depth Anything V2 Estimator Block

	A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block for monocular depth estimation using [Depth Anything V2](https://huggingface.co/depth-anything/Depth-Anything-V2-Large-hf). Supports both images and videos.

	## Features

	- Relative depth estimation using Depth Anything V2 (Large variant, 335M params)
	- Image and video input support
	- Grayscale or turbo colormap visualization

	## Installation

	```bash
	# Using uv
	uv sync

	# Using pip
	pip install -r requirements.txt
	```

	## Quick Start

	### Load the block

	```python
	from diffusers import ModularPipelineBlocks
	import torch

	blocks = ModularPipelineBlocks.from_pretrained(
	"your-username/depth-anything-v2-estimator", # or local path "."
	trust_remote_code=True,
	)
	pipeline = blocks.init_pipeline()
	pipeline.load_components(torch_dtype=torch.float16)
	pipeline.to("cuda")
	```

	### Single image - grayscale depth

	```python
	from PIL import Image

	image = Image.open("photo.jpg")
	output = pipeline(image=image)

	# Save depth map
	output.depth_image.save("photo_depth.png")

	# Access raw relative depth tensor
	print(output.predicted_depth.shape) # (H, W)
	```

	### Single image - turbo colormap

	```python
	output = pipeline(image=image, colormap="turbo")
	output.depth_image.save("photo_depth_turbo.png")
	```

	### Video - grayscale depth

	```python
	from block import save_video

	output = pipeline(video_path="input.mp4", colormap="grayscale")
	save_video(output.depth_frames, output.fps, "output_depth.mp4")
	```

	### Video - turbo colormap

	```python
	output = pipeline(video_path="input.mp4", colormap="turbo")
	save_video(output.depth_frames, output.fps, "output_depth_turbo.mp4")
	```

	## Inputs

	\| Parameter \| Type \| Default \| Description \|
	\|-----------\|------\|---------\|-------------\|
	\| `image` \| `PIL.Image` \| - \| Image to estimate depth for \|
	\| `video_path` \| `str` \| - \| Path to input video. When provided, `image` is ignored \|
	\| `colormap` \| `str` \| `"grayscale"` \| `"grayscale"` or `"turbo"` (colormapped) \|

	## Outputs

	### Image mode

	\| Output \| Type \| Description \|
	\|--------\|------\|-------------\|
	\| `depth_image` \| `PIL.Image` \| Normalized depth visualization \|
	\| `predicted_depth` \| `torch.Tensor` \| Raw relative depth (H x W) \|

	### Video mode

	\| Output \| Type \| Description \|
	\|--------\|------\|-------------\|
	\| `depth_frames` \| `List[PIL.Image]` \| Per-frame depth visualizations \|
	\| `fps` \| `float` \| Source video frame rate \|

	## Depth Normalization

	Depth values are min-max normalized and inverted so that bright areas represent nearby surfaces and dark areas represent distant ones.

	- Bright = close, dark = far (grayscale)
	- Warm (red/yellow) = close, cool (blue) = far (turbo)

	## Model Variants

	The block defaults to `depth-anything/Depth-Anything-V2-Large-hf`. Other available variants:

	\| Variant \| Model ID \| Params \|
	\|---------\|----------\|--------\|
	\| Small \| `depth-anything/Depth-Anything-V2-Small-hf` \| 24.8M \|
	\| Base \| `depth-anything/Depth-Anything-V2-Base-hf` \| 97.5M \|
	\| Large (default) \| `depth-anything/Depth-Anything-V2-Large-hf` \| 335M \|