sdkv2's picture
Add model card
bc5ba3a verified
|
Raw
History Blame Contribute Delete
2.78 kB
metadata
license: apache-2.0
base_model: depth-anything/DA3MONO-LARGE
pipeline_tag: depth-estimation
library_name: coreml
tags:
  - coreml
  - depth-estimation
  - monocular-depth
  - depth-anything
  - apple-silicon
  - stereo

DepthAnythingV3Mono-CoreML

A CoreML conversion of depth-anything/DA3MONO-LARGE — the monocular-depth variant of Depth Anything 3 (DINOv2 ViT-L backbone + DPT head, ~0.35B params) — packaged for on-device inference on Apple Silicon (macOS 14+).

This is a derivative work of the original model, which is licensed Apache-2.0; this conversion is released under the same license. All credit for the model itself goes to ByteDance / the Depth Anything 3 authors. See the original repo.

What's in here

  • DepthAnythingV3Mono.mlpackage — an ML Program, FP16 weights, minimum deployment target macOS 14.

Interface

  • Input image: an RGB image, 504×504 (a multiple of the DINOv2 patch size, 14). ImageNet normalization is baked into the graph; the CoreML ImageType only rescales 0–255 → 0–1, so you can hand it a CVPixelBuffer built straight from a CGImage with no manual preprocessing.
  • Output depth: a single-channel MLMultiArray of shape (1, 504, 504) holding relative depth (model-relative units). Consumers typically min-max normalize to 0…1.

Conversion notes

Converted with coremltools from a torch.jit.trace of backbone → head → depth. The full Depth Anything 3 forward() also runs camera-pose, sky and Gaussian-splat post-processing; those are either inert for the mono model or not traceable (the sky refinement is a data-dependent torch.quantile), so only the raw relative-depth path is converted. DINOv2's bicubic positional-embedding interpolation is substituted with bilinear (coremltools has no upsample_bicubic2d); this is a sub-pixel approximation.

Fidelity: on a structured test image, the CoreML output matches the FP32 PyTorch reference with a Pearson correlation of 0.99996 (normalized MAE 0.15%).

Usage (Swift / CoreML)

import CoreML
import CoreImage

let model = try MLModel(contentsOf: compiledURL)   // compile the .mlpackage first
// Provide `image` as a 504×504 CVPixelBuffer (32BGRA); read `depth` as an MLMultiArray (1×504×504).

It is used as the default depth model in the SBS 3D image viewer (replacing Depth Anything V2-Large), chosen specifically because DA3MONO-LARGE is Apache-2.0 and therefore safe for commercial distribution.

License & attribution

Apache-2.0, inherited from the upstream model. If you use this, please cite the original Depth Anything 3 work.