--- license: apple-ascl license_url: ./LICENSE library_name: onnxruntime pipeline_tag: depth-estimation tags: - onnx - depth-estimation - apple - fp16 - gpu base_model: - apple/DepthPro --- # Performance-Optimized & Lightweight ONNX Version of DepthPro This ONNX-based DepthPro model generates high-quality depth maps with minimal overhead. Depth values are encoded such that near points are bright and far points are dark, making the output directly usable for stereo and disparity-based applications without additional inversion or preprocessing. The model is optimized for efficient inference on standard hardware. > [!TIP] > **See it in action:** [Video Stereo Converter](https://github.com/jens-duttke/Video-Stereo-Converter) uses this model to convert 2D videos into immersive 3D stereoscopic content — with batch processing, resumable workflows, and smart disk management built in. ## Key Features - **Depth-only ONNX export**: Significantly reduced model size while preserving full depth quality - **Skips field-of-view calibration**: Outputs raw predicted depth values without the post-processing step, avoiding normalization artifacts and computational overhead - **Disparity-ready output**: Compatible with stereo/disparity workflows out of the box - no conversion needed - **FP16 weights**: Optimized for GPU acceleration via DirectML for faster inference - **Batch size 1**: Benchmarks show single-image batches deliver optimal throughput; larger batches are slower - **Opset 21**: Uses modern ONNX operators for broader runtime optimization support - **Aggressive graph optimization**: Simplified model graph for reduced computation and faster loading - **Fast inference**: Minimal memory footprint and rapid depth map generation ## Technical Specifications | Property | Value | |----------|-------| | Input shape | `(1, 3, 1536, 1536)` NCHW | | Input dtype | `float16` | | Input range | `[-1.0, 1.0]` (normalized RGB) | | Output shape | `(1, 1536, 1536)` | | Output dtype | `float16` | | Output range | Relative depth (higher = closer) | ## Requirements - **VRAM**: ~5.2 GB - **ONNX Runtime**: 1.19.0 or higher - **Python**: 3.8 or higher ## Quick Start ```bash pip install onnxruntime-directml numpy opencv-python ``` ```python import cv2 import numpy as np import onnxruntime as ort # Load model session = ort.InferenceSession('depthpro_1536x1536_bs1_fp16_opset21_optimized.onnx', providers=['DmlExecutionProvider', 'CPUExecutionProvider']) input_name, output_name = session.get_inputs()[0].name, session.get_outputs()[0].name # Load & preprocess img = cv2.cvtColor(cv2.imread('examples/sample1/source.jpg'), cv2.COLOR_BGR2RGB) img = cv2.resize(img, (1536, 1536)) img = np.transpose(((img.astype(np.float32)/127.5)-1.0).astype(np.float16), (2,0,1))[np.newaxis] # Inference depth = session.run([output_name], {input_name: img})[0].squeeze().astype(np.float32) # Clip extreme values and normalize depth = np.clip(np.nan_to_num(depth, nan=0.0), -1e3, 1e3) depth_norm = (depth - depth.min()) / max(depth.max() - depth.min(), 1e-6) # Save 8-bit PNG for smaller size cv2.imwrite('depth_frame_0001.png', (depth_norm * 255).round().astype(np.uint8)) # Save 16-bit TIFF for higher precision cv2.imwrite('depth_frame_0001.tif', (depth_norm * 65535).round().astype(np.uint16), [cv2.IMWRITE_TIFF_COMPRESSION, cv2.IMWRITE_TIFF_COMPRESSION_DEFLATE]) print(f'Depth maps saved') ``` ## Benchmark: Speed, Size & Depth Map Quality Benchmarked on an AMD Radeon RX 7900 XTX using ONNX Runtime v1.23.0 with DirectML. ### DepthPro-based models WITHOUT Post-Processing
Model Throughput Model Size
apple/DepthPro-hf 1.5 img/min 1.8 GB
Owl3D Precision V2 9.6 img/min 1.2 GB
This Model 75.7 img/min 1.2 GB
### DepthPro-based models WITH Post-Processing DepthPro's post-processing step calibrates depth values using field-of-view information and normalizes the output. This can cause severe artifacts: - **Crushed contrast**: Extreme outlier depth values (e.g., 10,000 m instead of the typical ~130 m maximum observed across various scenes) cause normalization to compress useful depth information into a narrow range, mapping most pixels to extreme near values - **Inconsistent results**: These artifacts appear unpredictably, especially with quantized models, but also with full-precision versions The models below use post-processing and may exhibit these issues depending on the scene:
Model Throughput Model Size
apple/DepthPro-hf 1.5 img/min 1.8 GB
DepthPro-ONNX - model_fp16.onnx 69.4 img/min 1.8 GB
DepthPro-ONNX - model_q4f16.onnx 52.9 img/min 0.6 GB
DepthPro-ONNX - model.onnx 44.0 img/min 3.5 GB
DepthPro-ONNX - model_q4.onnx 33.3 img/min 0.7 GB
DepthPro-ONNX - model_quantized.onnx 17.3 img/min 0.9 GB
DepthPro-ONNX - model_uint8.onnx 17.3 img/min 0.9 GB
DepthPro-ONNX - model_int8.onnx 15.9 img/min 0.9 GB
DepthPro-ONNX - model_bnb4.onnx 1.3 img/min 0.6 GB
--- ## License / Usage This ONNX version of DepthPro is licensed under the Apple Machine Learning Research Model License. - Use is restricted to non-commercial scientific research and academic development. - Redistribution is allowed only with this license included. - Do not use Apple's trademarks, logos, or name to promote derivative models. - Commercial use, product integration, or service deployment is **not allowed**.