| | ---
|
| | license: apple-ascl
|
| | license_url: ./LICENSE
|
| | library_name: onnxruntime
|
| | pipeline_tag: depth-estimation
|
| | tags:
|
| | - onnx
|
| | - depth-estimation
|
| | - apple
|
| | - fp16
|
| | - gpu
|
| | base_model:
|
| | - apple/DepthPro
|
| | ---
|
| |
|
| | # Performance-Optimized & Lightweight ONNX Version of DepthPro
|
| |
|
| | This ONNX-based DepthPro model generates high-quality depth maps with minimal overhead. Depth values are encoded such that near points are bright and far points are dark, making the output directly usable for stereo and disparity-based applications without additional inversion or preprocessing. The model is optimized for efficient inference on standard hardware.
|
| |
|
| | > [!TIP]
|
| | > **See it in action:** [Video Stereo Converter](https://github.com/jens-duttke/Video-Stereo-Converter) uses this model to convert 2D videos into immersive 3D stereoscopic content — with batch processing, resumable workflows, and smart disk management built in.
|
| |
|
| | ## Key Features
|
| |
|
| | - **Depth-only ONNX export**: Significantly reduced model size while preserving full depth quality
|
| | - **Skips field-of-view calibration**: Outputs raw predicted depth values without the post-processing step, avoiding normalization artifacts and computational overhead
|
| | - **Disparity-ready output**: Compatible with stereo/disparity workflows out of the box - no conversion needed
|
| | - **FP16 weights**: Optimized for GPU acceleration via DirectML for faster inference
|
| | - **Batch size 1**: Benchmarks show single-image batches deliver optimal throughput; larger batches are slower
|
| | - **Opset 21**: Uses modern ONNX operators for broader runtime optimization support
|
| | - **Aggressive graph optimization**: Simplified model graph for reduced computation and faster loading
|
| | - **Fast inference**: Minimal memory footprint and rapid depth map generation
|
| |
|
| | ## Technical Specifications
|
| |
|
| | | Property | Value |
|
| | |----------|-------|
|
| | | Input shape | `(1, 3, 1536, 1536)` NCHW |
|
| | | Input dtype | `float16` |
|
| | | Input range | `[-1.0, 1.0]` (normalized RGB) |
|
| | | Output shape | `(1, 1536, 1536)` |
|
| | | Output dtype | `float16` |
|
| | | Output range | Relative depth (higher = closer) |
|
| |
|
| | ## Requirements
|
| |
|
| | - **VRAM**: ~5.2 GB
|
| | - **ONNX Runtime**: 1.19.0 or higher
|
| | - **Python**: 3.8 or higher
|
| |
|
| | ## Quick Start
|
| |
|
| | ```bash
|
| | pip install onnxruntime-directml numpy opencv-python
|
| | ```
|
| |
|
| | ```python
|
| | import cv2
|
| | import numpy as np
|
| | import onnxruntime as ort
|
| |
|
| | # Load model
|
| | session = ort.InferenceSession('depthpro_1536x1536_bs1_fp16_opset21_optimized.onnx', providers=['DmlExecutionProvider', 'CPUExecutionProvider'])
|
| | input_name, output_name = session.get_inputs()[0].name, session.get_outputs()[0].name
|
| |
|
| | # Load & preprocess
|
| | img = cv2.cvtColor(cv2.imread('examples/sample1/source.jpg'), cv2.COLOR_BGR2RGB)
|
| | img = cv2.resize(img, (1536, 1536))
|
| | img = np.transpose(((img.astype(np.float32)/127.5)-1.0).astype(np.float16), (2,0,1))[np.newaxis]
|
| |
|
| | # Inference
|
| | depth = session.run([output_name], {input_name: img})[0].squeeze().astype(np.float32)
|
| |
|
| | # Clip extreme values and normalize
|
| | depth = np.clip(np.nan_to_num(depth, nan=0.0), -1e3, 1e3)
|
| | depth_norm = (depth - depth.min()) / max(depth.max() - depth.min(), 1e-6)
|
| |
|
| | # Save 8-bit PNG for smaller size
|
| | cv2.imwrite('depth_frame_0001.png', (depth_norm * 255).round().astype(np.uint8))
|
| |
|
| | # Save 16-bit TIFF for higher precision
|
| | cv2.imwrite('depth_frame_0001.tif', (depth_norm * 65535).round().astype(np.uint16), [cv2.IMWRITE_TIFF_COMPRESSION, cv2.IMWRITE_TIFF_COMPRESSION_DEFLATE])
|
| |
|
| | print(f'Depth maps saved')
|
| | ```
|
| |
|
| | ## Benchmark: Speed, Size & Depth Map Quality
|
| |
|
| | Benchmarked on an AMD Radeon RX 7900 XTX using ONNX Runtime v1.23.0 with DirectML.
|
| |
|
| | ### DepthPro-based models WITHOUT Post-Processing
|
| |
|
| | <table width="100%">
|
| | <tr>
|
| | <th>Model</th>
|
| | <th width="120">Throughput</th>
|
| | <th width="100">Model Size</th>
|
| | <th width="128"><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/source.jpg"><img src="examples/sample1/source.jpg" width="128" /></a></th>
|
| | <th width="128"><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/source.jpg"><img src="examples/sample2/source.jpg" width="128" /></a></th>
|
| | <th width="128"><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/source.jpg"><img src="examples/sample3/source.jpg" width="128" /></a></th>
|
| | <th width="128"><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/source.jpg"><img src="examples/sample4/source.jpg" width="128" /></a></th>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/apple/DepthPro-hf">apple/DepthPro-hf</a></td>
|
| | <td>1.5 img/min</td>
|
| | <td>1.8 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/DepthPro-hf_nopost_plasma.png"><img src="examples/sample1/DepthPro-hf_nopost_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/DepthPro-hf_nopost_plasma.png"><img src="examples/sample2/DepthPro-hf_nopost_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/DepthPro-hf_nopost_plasma.png"><img src="examples/sample3/DepthPro-hf_nopost_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/DepthPro-hf_nopost_plasma.png"><img src="examples/sample4/DepthPro-hf_nopost_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td>Owl3D Precision V2</td>
|
| | <td>9.6 img/min</td>
|
| | <td>1.2 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/owl3d_plasma.png"><img src="examples/sample1/owl3d_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/owl3d_plasma.png"><img src="examples/sample2/owl3d_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/owl3d_plasma.png"><img src="examples/sample3/owl3d_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/owl3d_plasma.png"><img src="examples/sample4/owl3d_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td><strong>This Model</strong></td>
|
| | <td><strong>75.7 img/min</strong></td>
|
| | <td>1.2 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/depthpro_1536x1536_bs1_fp16_opset21_optimized_plasma.png"><img src="examples/sample1/depthpro_1536x1536_bs1_fp16_opset21_optimized_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/depthpro_1536x1536_bs1_fp16_opset21_optimized_plasma.png"><img src="examples/sample2/depthpro_1536x1536_bs1_fp16_opset21_optimized_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/depthpro_1536x1536_bs1_fp16_opset21_optimized_plasma.png"><img src="examples/sample3/depthpro_1536x1536_bs1_fp16_opset21_optimized_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/depthpro_1536x1536_bs1_fp16_opset21_optimized_plasma.png"><img src="examples/sample4/depthpro_1536x1536_bs1_fp16_opset21_optimized_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | </table>
|
| |
|
| | ### DepthPro-based models WITH Post-Processing
|
| |
|
| | DepthPro's post-processing step calibrates depth values using field-of-view information and normalizes the output. This can cause severe artifacts:
|
| |
|
| | - **Crushed contrast**: Extreme outlier depth values (e.g., 10,000 m instead of the typical ~130 m maximum observed across various scenes) cause normalization to compress useful depth information into a narrow range, mapping most pixels to extreme near values
|
| | - **Inconsistent results**: These artifacts appear unpredictably, especially with quantized models, but also with full-precision versions
|
| |
|
| | The models below use post-processing and may exhibit these issues depending on the scene:
|
| |
|
| | <table width="100%">
|
| | <tr>
|
| | <th>Model</th>
|
| | <th width="120">Throughput</th>
|
| | <th width="100">Model Size</th>
|
| | <th width="128"><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/source.jpg"><img src="examples/sample1/source.jpg" width="128" /></a></th>
|
| | <th width="128"><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/source.jpg"><img src="examples/sample2/source.jpg" width="128" /></a></th>
|
| | <th width="128"><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/source.jpg"><img src="examples/sample3/source.jpg" width="128" /></a></th>
|
| | <th width="128"><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/source.jpg"><img src="examples/sample4/source.jpg" width="128" /></a></th>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/apple/DepthPro-hf">apple/DepthPro-hf</a></td>
|
| | <td>1.5 img/min</td>
|
| | <td>1.8 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/DepthPro-hf_plasma.png"><img src="examples/sample1/DepthPro-hf_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/DepthPro-hf_plasma.png"><img src="examples/sample2/DepthPro-hf_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/DepthPro-hf_plasma.png"><img src="examples/sample3/DepthPro-hf_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/DepthPro-hf_plasma.png"><img src="examples/sample4/DepthPro-hf_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/onnx-community/DepthPro-ONNX">DepthPro-ONNX - model_fp16.onnx</a></td>
|
| | <td>69.4 img/min</td>
|
| | <td>1.8 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/model_fp16_plasma.png"><img src="examples/sample1/model_fp16_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/model_fp16_plasma.png"><img src="examples/sample2/model_fp16_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/model_fp16_plasma.png"><img src="examples/sample3/model_fp16_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/model_fp16_plasma.png"><img src="examples/sample4/model_fp16_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/onnx-community/DepthPro-ONNX">DepthPro-ONNX - model_q4f16.onnx</a></td>
|
| | <td>52.9 img/min</td>
|
| | <td><strong>0.6 GB</strong></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/model_q4f16_plasma.png"><img src="examples/sample1/model_q4f16_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/model_q4f16_plasma.png"><img src="examples/sample2/model_q4f16_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/model_q4f16_plasma.png"><img src="examples/sample3/model_q4f16_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/model_q4f16_plasma.png"><img src="examples/sample4/model_q4f16_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/onnx-community/DepthPro-ONNX">DepthPro-ONNX - model.onnx</a></td>
|
| | <td>44.0 img/min</td>
|
| | <td>3.5 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/model_plasma.png"><img src="examples/sample1/model_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/model_plasma.png"><img src="examples/sample2/model_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/model_plasma.png"><img src="examples/sample3/model_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/model_plasma.png"><img src="examples/sample4/model_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/onnx-community/DepthPro-ONNX">DepthPro-ONNX - model_q4.onnx</a></td>
|
| | <td>33.3 img/min</td>
|
| | <td>0.7 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/model_q4_plasma.png"><img src="examples/sample1/model_q4_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/model_q4_plasma.png"><img src="examples/sample2/model_q4_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/model_q4_plasma.png"><img src="examples/sample3/model_q4_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/model_q4_plasma.png"><img src="examples/sample4/model_q4_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/onnx-community/DepthPro-ONNX">DepthPro-ONNX - model_quantized.onnx</a></td>
|
| | <td>17.3 img/min</td>
|
| | <td>0.9 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/model_quantized_plasma.png"><img src="examples/sample1/model_quantized_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/model_quantized_plasma.png"><img src="examples/sample2/model_quantized_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/model_quantized_plasma.png"><img src="examples/sample3/model_quantized_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/model_quantized_plasma.png"><img src="examples/sample4/model_quantized_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/onnx-community/DepthPro-ONNX">DepthPro-ONNX - model_uint8.onnx</a></td>
|
| | <td>17.3 img/min</td>
|
| | <td>0.9 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/model_uint8_plasma.png"><img src="examples/sample1/model_uint8_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/model_uint8_plasma.png"><img src="examples/sample2/model_uint8_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/model_uint8_plasma.png"><img src="examples/sample3/model_uint8_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/model_uint8_plasma.png"><img src="examples/sample4/model_uint8_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/onnx-community/DepthPro-ONNX">DepthPro-ONNX - model_int8.onnx</a></td>
|
| | <td>15.9 img/min</td>
|
| | <td>0.9 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/model_int8_plasma.png"><img src="examples/sample1/model_int8_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/model_int8_plasma.png"><img src="examples/sample2/model_int8_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/model_int8_plasma.png"><img src="examples/sample3/model_int8_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/model_int8_plasma.png"><img src="examples/sample4/model_int8_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | <tr>
|
| | <td><a href="https://huggingface.co/onnx-community/DepthPro-ONNX">DepthPro-ONNX - model_bnb4.onnx</a></td>
|
| | <td>1.3 img/min</td>
|
| | <td>0.6 GB</td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample1/model_plasma.png"><img src="examples/sample1/model_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample2/model_plasma.png"><img src="examples/sample2/model_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample3/model_plasma.png"><img src="examples/sample3/model_plasma.png" width="128" /></a></td>
|
| | <td><a href="https://huggingface.co/Jens-Duttke/DepthPro-ONNX-HighPerf/resolve/main/examples/sample4/model_plasma.png"><img src="examples/sample4/model_plasma.png" width="128" /></a></td>
|
| | </tr>
|
| | </table>
|
| |
|
| | ---
|
| |
|
| | ## License / Usage
|
| |
|
| | This ONNX version of DepthPro is licensed under the Apple Machine Learning Research Model License.
|
| |
|
| | - Use is restricted to non-commercial scientific research and academic development.
|
| | - Redistribution is allowed only with this license included.
|
| | - Do not use Apple's trademarks, logos, or name to promote derivative models.
|
| | - Commercial use, product integration, or service deployment is **not allowed**.
|
| |
|