---
license: apple-ascl
license_url: ./LICENSE
library_name: onnxruntime
pipeline_tag: depth-estimation
tags:
- onnx
- depth-estimation
- apple
- fp16
- gpu
base_model:
- apple/DepthPro
---
# Performance-Optimized & Lightweight ONNX Version of DepthPro
This ONNX-based DepthPro model generates high-quality depth maps with minimal overhead. Depth values are encoded such that near points are bright and far points are dark, making the output directly usable for stereo and disparity-based applications without additional inversion or preprocessing. The model is optimized for efficient inference on standard hardware.
> [!TIP]
> **See it in action:** [Video Stereo Converter](https://github.com/jens-duttke/Video-Stereo-Converter) uses this model to convert 2D videos into immersive 3D stereoscopic content — with batch processing, resumable workflows, and smart disk management built in.
## Key Features
- **Depth-only ONNX export**: Significantly reduced model size while preserving full depth quality
- **Skips field-of-view calibration**: Outputs raw predicted depth values without the post-processing step, avoiding normalization artifacts and computational overhead
- **Disparity-ready output**: Compatible with stereo/disparity workflows out of the box - no conversion needed
- **FP16 weights**: Optimized for GPU acceleration via DirectML for faster inference
- **Batch size 1**: Benchmarks show single-image batches deliver optimal throughput; larger batches are slower
- **Opset 21**: Uses modern ONNX operators for broader runtime optimization support
- **Aggressive graph optimization**: Simplified model graph for reduced computation and faster loading
- **Fast inference**: Minimal memory footprint and rapid depth map generation
## Technical Specifications
| Property | Value |
|----------|-------|
| Input shape | `(1, 3, 1536, 1536)` NCHW |
| Input dtype | `float16` |
| Input range | `[-1.0, 1.0]` (normalized RGB) |
| Output shape | `(1, 1536, 1536)` |
| Output dtype | `float16` |
| Output range | Relative depth (higher = closer) |
## Requirements
- **VRAM**: ~5.2 GB
- **ONNX Runtime**: 1.19.0 or higher
- **Python**: 3.8 or higher
## Quick Start
```bash
pip install onnxruntime-directml numpy opencv-python
```
```python
import cv2
import numpy as np
import onnxruntime as ort
# Load model
session = ort.InferenceSession('depthpro_1536x1536_bs1_fp16_opset21_optimized.onnx', providers=['DmlExecutionProvider', 'CPUExecutionProvider'])
input_name, output_name = session.get_inputs()[0].name, session.get_outputs()[0].name
# Load & preprocess
img = cv2.cvtColor(cv2.imread('examples/sample1/source.jpg'), cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (1536, 1536))
img = np.transpose(((img.astype(np.float32)/127.5)-1.0).astype(np.float16), (2,0,1))[np.newaxis]
# Inference
depth = session.run([output_name], {input_name: img})[0].squeeze().astype(np.float32)
# Clip extreme values and normalize
depth = np.clip(np.nan_to_num(depth, nan=0.0), -1e3, 1e3)
depth_norm = (depth - depth.min()) / max(depth.max() - depth.min(), 1e-6)
# Save 8-bit PNG for smaller size
cv2.imwrite('depth_frame_0001.png', (depth_norm * 255).round().astype(np.uint8))
# Save 16-bit TIFF for higher precision
cv2.imwrite('depth_frame_0001.tif', (depth_norm * 65535).round().astype(np.uint16), [cv2.IMWRITE_TIFF_COMPRESSION, cv2.IMWRITE_TIFF_COMPRESSION_DEFLATE])
print(f'Depth maps saved')
```
## Benchmark: Speed, Size & Depth Map Quality
Benchmarked on an AMD Radeon RX 7900 XTX using ONNX Runtime v1.23.0 with DirectML.
### DepthPro-based models WITHOUT Post-Processing
| Model |
Throughput |
Model Size |
 |
 |
 |
 |
| apple/DepthPro-hf |
1.5 img/min |
1.8 GB |
 |
 |
 |
 |
| Owl3D Precision V2 |
9.6 img/min |
1.2 GB |
 |
 |
 |
 |
| This Model |
75.7 img/min |
1.2 GB |
 |
 |
 |
 |
### DepthPro-based models WITH Post-Processing
DepthPro's post-processing step calibrates depth values using field-of-view information and normalizes the output. This can cause severe artifacts:
- **Crushed contrast**: Extreme outlier depth values (e.g., 10,000 m instead of the typical ~130 m maximum observed across various scenes) cause normalization to compress useful depth information into a narrow range, mapping most pixels to extreme near values
- **Inconsistent results**: These artifacts appear unpredictably, especially with quantized models, but also with full-precision versions
The models below use post-processing and may exhibit these issues depending on the scene:
---
## License / Usage
This ONNX version of DepthPro is licensed under the Apple Machine Learning Research Model License.
- Use is restricted to non-commercial scientific research and academic development.
- Redistribution is allowed only with this license included.
- Do not use Apple's trademarks, logos, or name to promote derivative models.
- Commercial use, product integration, or service deployment is **not allowed**.