--- license: cc-by-nc-4.0 tags: - depth-estimation - coreml - apple-silicon - vision - computer-vision library_name: coreml --- # Depth Anything V2 - CoreML Depth Anything V2 models (Base and Large) converted to CoreML format for optimized inference on Apple Silicon (M-series chips). ## Models | Model | Size | Parameters | Performance (M4 Pro est.) | License | |-------|------|------------|---------------------------|---------| | Small F16 | 48 MB | 24.8M | ~30ms (~33 fps) | Apache-2.0 | | Base F16 | 172 MB | 97.5M | ~60-90ms (~14 fps) | CC-BY-NC-4.0 | | Large F16 | 590 MB | 335.3M | ~200-300ms (~4 fps) | CC-BY-NC-4.0 | All models use Float16 precision and run on Apple's Neural Engine + GPU + CPU. ## License Both **Base** and **Large** models are **CC-BY-NC-4.0** (non-commercial only), following the [official Depth Anything V2 licensing](https://github.com/DepthAnything/Depth-Anything-V2#license). **For commercial use**, you must use the Small model (Apache-2.0), which is available directly from [Apple's CoreML model zoo](https://developer.apple.com/machine-learning/models/). ## Download **Base model**: ```bash curl -L -o DepthAnythingV2BaseF16.mlpackage.tar.gz \ "https://huggingface.co/mrgnw/depth-anything-v2-coreml/resolve/main/DepthAnythingV2BaseF16.mlpackage.tar.gz" tar -xzf DepthAnythingV2BaseF16.mlpackage.tar.gz ``` **Large model**: ```bash curl -L -o DepthAnythingV2LargeF16.mlpackage.tar.gz \ "https://huggingface.co/mrgnw/depth-anything-v2-coreml/resolve/main/DepthAnythingV2LargeF16.mlpackage.tar.gz" tar -xzf DepthAnythingV2LargeF16.mlpackage.tar.gz ``` **Small model** (from Apple): ```bash curl -L -o DepthAnythingV2SmallF16.mlpackage.zip \ "https://ml-assets.apple.com/coreml/models/Image/DepthEstimation/DepthAnything/DepthAnythingV2SmallF16.mlpackage.zip" unzip DepthAnythingV2SmallF16.mlpackage.zip ``` ## Usage ### Swift ```swift import CoreML let modelURL = URL(fileURLWithPath: "DepthAnythingV2BaseF16.mlpackage") let config = MLModelConfiguration() config.computeUnits = .all // Use Neural Engine + GPU + CPU let model = try MLModel(contentsOf: modelURL, configuration: config) // Input: RGB image (1, 3, 518, 518) // Output: depth map (1, 518, 518) ``` ## Performance **M4 Pro (estimated):** - Small: ~25-30ms per frame - Base: ~60-90ms per frame - Large: ~200-300ms per frame These are **10-20x faster** than ONNX CPU inference because they use the Apple Neural Engine. ## Citation ```bibtex @article{yang2024depth, title={Depth Anything V2}, author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, journal={arXiv:2406.09414}, year={2024} } ``` ## Related - [Original Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2) - [spatial-maker](https://github.com/mrgnw/spatial-maker) - Uses these models for spatial video/photo conversion