Depth Anything 3 β Core AI
The coreai-model-zoo's first depth model.
Monocular (single-image) relative depth estimation running fully on-device on Apple's Core AI
runtime, as a single static .aimodel. A conversion of ByteDance's
Depth Anything 3
(depth-anything/DA3-SMALL /
DA3-BASE, Apache-2.0): a DINOv2 ViT backbone +
DPT-style head. Drop in an RGB image, get a depth map (and a confidence map). No NMS, no sampling β
host post-processing is just a colormap.
Bundles
| dir | variant | params | dtype | size | M4 Max GPU |
|---|---|---|---|---|---|
small/da3-small_float16.aimodel |
ViT-S | 34.3M | fp16 | 54 MB | 65.7 FPS |
small/da3-small_float32.aimodel |
ViT-S | 34.3M | fp32 | 105 MB | 56.5 FPS |
base/da3-base_float16.aimodel |
ViT-B | 135.4M | fp16 | 202 MB | 26.5 FPS |
base/da3-base_float32.aimodel |
ViT-B | 135.4M | fp32 | 402 MB | 23.0 FPS |
small Β· fp16 is the on-device hero β 54 MB, 65 FPS at 504Β² on an M4 Max, comfortably real-time on
iPhone-class GPUs. Each .aimodel is a directory bundle (main.mlirb + metadata.json).
I/O contract
input : image [1, 3, 504, 504] RGB, raw [0, 1] (ImageNet normalization is folded into the graph)
output: depth [1, 504, 504] relative depth (exp-activated; larger = nearer)
depth_conf [1, 504, 504] confidence
Host: resize the RGB image to 504 Γ 504 (e.g. cv2 INTER_AREA), feed raw [0, 1], run, then resize
the depth map back to the original H Γ W. For display, the DA3 convention is inverse-depth β
percentile 2β98 normalize β Spectral colormap.
Fidelity
- Bit-exact conversion: the Core AI engine matches the PyTorch reference at cos 1.000000 (β€ ~1e-5 / ~1e-2 per-pixel for fp32 / fp16) on both CPU and GPU, at any fixed input shape.
- vs the official DA3 viewer: mean Pearson r β 0.98 across diverse aspect ratios (square inputs r = 1.000) β within DA3's own resolution sensitivity (its 504-vs-518 outputs differ by r β 0.975β0.984).
Usage (CoreAIKit / coreai.runtime)
import coreai.runtime as rt, numpy as np
from PIL import Image
m = await rt.AIModel.load("small/da3-small_float16.aimodel",
rt.SpecializationOptions.from_preferred_compute_unit_kind(rt.ComputeUnitKind.gpu()))
fn = m.load_function("main")
img = np.asarray(Image.open("photo.jpg").convert("RGB").resize((504, 504)))
x = (img.astype(np.float16) / 255.0).transpose(2, 0, 1)[None] # raw [0,1], NCHW
depth = (await fn({"image": rt.NDArray(x)}))["depth"].numpy().reshape(504, 504)
Links
- Conversion script + model card: coreai-model-zoo
zoo/depth-anything-3.md - Source: Depth Anything 3 Β· Apache-2.0
On-device ML / Core ML / Core AI model porting β get in touch: open an issue on the zoo.
Model tree for mlboydaisuke/Depth-Anything-3-CoreAI
Base model
depth-anything/DA3-BASE