Performance-Optimized & Lightweight ONNX Version of DepthPro

This ONNX-based DepthPro model generates high-quality depth maps with minimal overhead. Depth values are encoded such that near points are bright and far points are dark, making the output directly usable for stereo and disparity-based applications without additional inversion or preprocessing. The model is optimized for efficient inference on standard hardware.

See it in action: Video Stereo Converter uses this model to convert 2D videos into immersive 3D stereoscopic content — with batch processing, resumable workflows, and smart disk management built in.

Key Features

Depth-only ONNX export: Significantly reduced model size while preserving full depth quality
Skips field-of-view calibration: Outputs raw predicted depth values without the post-processing step, avoiding normalization artifacts and computational overhead
Disparity-ready output: Compatible with stereo/disparity workflows out of the box - no conversion needed
FP16 weights: Optimized for GPU acceleration via DirectML for faster inference
Batch size 1: Benchmarks show single-image batches deliver optimal throughput; larger batches are slower
Opset 21: Uses modern ONNX operators for broader runtime optimization support
Aggressive graph optimization: Simplified model graph for reduced computation and faster loading
Fast inference: Minimal memory footprint and rapid depth map generation

Technical Specifications

Property	Value
Input shape	`(1, 3, 1536, 1536)` NCHW
Input dtype	`float16`
Input range	`[-1.0, 1.0]` (normalized RGB)
Output shape	`(1, 1536, 1536)`
Output dtype	`float16`
Output range	Relative depth (higher = closer)

Requirements

VRAM: ~5.2 GB
ONNX Runtime: 1.19.0 or higher
Python: 3.8 or higher

Quick Start

pip install onnxruntime-directml numpy opencv-python

import cv2
import numpy as np
import onnxruntime as ort

# Load model
session = ort.InferenceSession('depthpro_1536x1536_bs1_fp16_opset21_optimized.onnx', providers=['DmlExecutionProvider', 'CPUExecutionProvider'])
input_name, output_name = session.get_inputs()[0].name, session.get_outputs()[0].name

# Load & preprocess
img = cv2.cvtColor(cv2.imread('examples/sample1/source.jpg'), cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (1536, 1536))
img = np.transpose(((img.astype(np.float32)/127.5)-1.0).astype(np.float16), (2,0,1))[np.newaxis]

# Inference
depth = session.run([output_name], {input_name: img})[0].squeeze().astype(np.float32)

# Clip extreme values and normalize
depth = np.clip(np.nan_to_num(depth, nan=0.0), -1e3, 1e3)
depth_norm = (depth - depth.min()) / max(depth.max() - depth.min(), 1e-6)

# Save 8-bit PNG for smaller size
cv2.imwrite('depth_frame_0001.png', (depth_norm * 255).round().astype(np.uint8))

# Save 16-bit TIFF for higher precision
cv2.imwrite('depth_frame_0001.tif', (depth_norm * 65535).round().astype(np.uint16), [cv2.IMWRITE_TIFF_COMPRESSION, cv2.IMWRITE_TIFF_COMPRESSION_DEFLATE])

print(f'Depth maps saved')

Benchmark: Speed, Size & Depth Map Quality

Benchmarked on an AMD Radeon RX 7900 XTX using ONNX Runtime v1.23.0 with DirectML.

DepthPro-based models WITHOUT Post-Processing

Model	Throughput	Model Size
apple/DepthPro-hf	1.5 img/min	1.8 GB
Owl3D Precision V2	9.6 img/min	1.2 GB
This Model	75.7 img/min	1.2 GB

DepthPro-based models WITH Post-Processing

DepthPro's post-processing step calibrates depth values using field-of-view information and normalizes the output. This can cause severe artifacts:

Crushed contrast: Extreme outlier depth values (e.g., 10,000 m instead of the typical ~130 m maximum observed across various scenes) cause normalization to compress useful depth information into a narrow range, mapping most pixels to extreme near values
Inconsistent results: These artifacts appear unpredictably, especially with quantized models, but also with full-precision versions

The models below use post-processing and may exhibit these issues depending on the scene:

Model	Throughput	Model Size
apple/DepthPro-hf	1.5 img/min	1.8 GB
DepthPro-ONNX - model_fp16.onnx	69.4 img/min	1.8 GB
DepthPro-ONNX - model_q4f16.onnx	52.9 img/min	0.6 GB
DepthPro-ONNX - model.onnx	44.0 img/min	3.5 GB
DepthPro-ONNX - model_q4.onnx	33.3 img/min	0.7 GB
DepthPro-ONNX - model_quantized.onnx	17.3 img/min	0.9 GB
DepthPro-ONNX - model_uint8.onnx	17.3 img/min	0.9 GB
DepthPro-ONNX - model_int8.onnx	15.9 img/min	0.9 GB
DepthPro-ONNX - model_bnb4.onnx	1.3 img/min	0.6 GB

License / Usage

This ONNX version of DepthPro is licensed under the Apple Machine Learning Research Model License.

Use is restricted to non-commercial scientific research and academic development.
Redistribution is allowed only with this license included.
Do not use Apple's trademarks, logos, or name to promote derivative models.
Commercial use, product integration, or service deployment is not allowed.

Downloads last month: 54

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jens-Duttke/DepthPro-ONNX-HighPerf

Base model

apple/DepthPro

Quantized

(7)

this model