Morgan Williams

Add Base and Large CoreML models with documentation

e1db38c 9 days ago

2.91 kB

	---
	license: cc-by-nc-4.0
	tags:
	- depth-estimation
	- coreml
	- apple-silicon
	- vision
	- computer-vision
	library_name: coreml
	---

	# Depth Anything V2 - CoreML

	Depth Anything V2 models (Base and Large) converted to CoreML format for optimized inference on Apple Silicon (M-series chips).

	## Models

	\| Model \| Size \| Parameters \| Performance (M4 Pro est.) \| License \|
	\|-------\|------\|------------\|---------------------------\|---------\|
	\| Small F16 \| 48 MB \| 24.8M \| ~30ms (~33 fps) \| Apache-2.0 \|
	\| Base F16 \| 172 MB \| 97.5M \| ~60-90ms (~14 fps) \| CC-BY-NC-4.0 \|
	\| Large F16 \| 590 MB \| 335.3M \| ~200-300ms (~4 fps) \| CC-BY-NC-4.0 \|

	All models use Float16 precision and run on Apple's Neural Engine + GPU + CPU.

	## License

	Both Base and Large models are CC-BY-NC-4.0 (non-commercial only), following the [official Depth Anything V2 licensing](https://github.com/DepthAnything/Depth-Anything-V2#license).

	For commercial use, you must use the Small model (Apache-2.0), which is available directly from [Apple's CoreML model zoo](https://developer.apple.com/machine-learning/models/).

	## Download

	Base model:
	```bash
	curl -L -o DepthAnythingV2BaseF16.mlpackage.tar.gz \
	"https://huggingface.co/mrgnw/depth-anything-v2-coreml/resolve/main/DepthAnythingV2BaseF16.mlpackage.tar.gz"
	tar -xzf DepthAnythingV2BaseF16.mlpackage.tar.gz
	```

	Large model:
	```bash
	curl -L -o DepthAnythingV2LargeF16.mlpackage.tar.gz \
	"https://huggingface.co/mrgnw/depth-anything-v2-coreml/resolve/main/DepthAnythingV2LargeF16.mlpackage.tar.gz"
	tar -xzf DepthAnythingV2LargeF16.mlpackage.tar.gz
	```

	Small model (from Apple):
	```bash
	curl -L -o DepthAnythingV2SmallF16.mlpackage.zip \
	"https://ml-assets.apple.com/coreml/models/Image/DepthEstimation/DepthAnything/DepthAnythingV2SmallF16.mlpackage.zip"
	unzip DepthAnythingV2SmallF16.mlpackage.zip
	```

	## Usage

	### Swift

	```swift
	import CoreML

	let modelURL = URL(fileURLWithPath: "DepthAnythingV2BaseF16.mlpackage")
	let config = MLModelConfiguration()
	config.computeUnits = .all // Use Neural Engine + GPU + CPU

	let model = try MLModel(contentsOf: modelURL, configuration: config)
	// Input: RGB image (1, 3, 518, 518)
	// Output: depth map (1, 518, 518)
	```

	## Performance

	M4 Pro (estimated):
	- Small: ~25-30ms per frame
	- Base: ~60-90ms per frame
	- Large: ~200-300ms per frame

	These are 10-20x faster than ONNX CPU inference because they use the Apple Neural Engine.

	## Citation

	```bibtex
	@article{yang2024depth,
	title={Depth Anything V2},
	author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
	journal={arXiv:2406.09414},
	year={2024}
	}
	```

	## Related

	- [Original Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2)
	- [spatial-maker](https://github.com/mrgnw/spatial-maker) - Uses these models for spatial video/photo conversion