|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
tags: |
|
|
- depth-estimation |
|
|
- coreml |
|
|
- apple-silicon |
|
|
- vision |
|
|
- computer-vision |
|
|
library_name: coreml |
|
|
--- |
|
|
|
|
|
# Depth Anything V2 - CoreML |
|
|
|
|
|
Depth Anything V2 models (Base and Large) converted to CoreML format for optimized inference on Apple Silicon (M-series chips). |
|
|
|
|
|
## Models |
|
|
|
|
|
| Model | Size | Parameters | Performance (M4 Pro est.) | License | |
|
|
|-------|------|------------|---------------------------|---------| |
|
|
| Small F16 | 48 MB | 24.8M | ~30ms (~33 fps) | Apache-2.0 | |
|
|
| Base F16 | 172 MB | 97.5M | ~60-90ms (~14 fps) | CC-BY-NC-4.0 | |
|
|
| Large F16 | 590 MB | 335.3M | ~200-300ms (~4 fps) | CC-BY-NC-4.0 | |
|
|
|
|
|
All models use Float16 precision and run on Apple's Neural Engine + GPU + CPU. |
|
|
|
|
|
## License |
|
|
|
|
|
Both **Base** and **Large** models are **CC-BY-NC-4.0** (non-commercial only), following the [official Depth Anything V2 licensing](https://github.com/DepthAnything/Depth-Anything-V2#license). |
|
|
|
|
|
**For commercial use**, you must use the Small model (Apache-2.0), which is available directly from [Apple's CoreML model zoo](https://developer.apple.com/machine-learning/models/). |
|
|
|
|
|
## Download |
|
|
|
|
|
**Base model**: |
|
|
```bash |
|
|
curl -L -o DepthAnythingV2BaseF16.mlpackage.tar.gz \ |
|
|
"https://huggingface.co/mrgnw/depth-anything-v2-coreml/resolve/main/DepthAnythingV2BaseF16.mlpackage.tar.gz" |
|
|
tar -xzf DepthAnythingV2BaseF16.mlpackage.tar.gz |
|
|
``` |
|
|
|
|
|
**Large model**: |
|
|
```bash |
|
|
curl -L -o DepthAnythingV2LargeF16.mlpackage.tar.gz \ |
|
|
"https://huggingface.co/mrgnw/depth-anything-v2-coreml/resolve/main/DepthAnythingV2LargeF16.mlpackage.tar.gz" |
|
|
tar -xzf DepthAnythingV2LargeF16.mlpackage.tar.gz |
|
|
``` |
|
|
|
|
|
**Small model** (from Apple): |
|
|
```bash |
|
|
curl -L -o DepthAnythingV2SmallF16.mlpackage.zip \ |
|
|
"https://ml-assets.apple.com/coreml/models/Image/DepthEstimation/DepthAnything/DepthAnythingV2SmallF16.mlpackage.zip" |
|
|
unzip DepthAnythingV2SmallF16.mlpackage.zip |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Swift |
|
|
|
|
|
```swift |
|
|
import CoreML |
|
|
|
|
|
let modelURL = URL(fileURLWithPath: "DepthAnythingV2BaseF16.mlpackage") |
|
|
let config = MLModelConfiguration() |
|
|
config.computeUnits = .all // Use Neural Engine + GPU + CPU |
|
|
|
|
|
let model = try MLModel(contentsOf: modelURL, configuration: config) |
|
|
// Input: RGB image (1, 3, 518, 518) |
|
|
// Output: depth map (1, 518, 518) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
**M4 Pro (estimated):** |
|
|
- Small: ~25-30ms per frame |
|
|
- Base: ~60-90ms per frame |
|
|
- Large: ~200-300ms per frame |
|
|
|
|
|
These are **10-20x faster** than ONNX CPU inference because they use the Apple Neural Engine. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{yang2024depth, |
|
|
title={Depth Anything V2}, |
|
|
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, |
|
|
journal={arXiv:2406.09414}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Related |
|
|
|
|
|
- [Original Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2) |
|
|
- [spatial-maker](https://github.com/mrgnw/spatial-maker) - Uses these models for spatial video/photo conversion |
|
|
|