Depth Anything 3 — Core AI

The coreai-model-zoo's first depth model. Monocular (single-image) relative depth estimation running fully on-device on Apple's Core AI runtime, as a single static .aimodel. A conversion of ByteDance's Depth Anything 3 (depth-anything/DA3-SMALL / DA3-BASE, Apache-2.0): a DINOv2 ViT backbone + DPT-style head. Drop in an RGB image, get a depth map (and a confidence map). No NMS, no sampling — host post-processing is just a colormap.

Bundles

dir	variant	params	dtype	size	M4 Max GPU
`small/da3-small_float16.aimodel`	ViT-S	34.3M	fp16	54 MB	65.7 FPS
`small/da3-small_float32.aimodel`	ViT-S	34.3M	fp32	105 MB	56.5 FPS
`base/da3-base_float16.aimodel`	ViT-B	135.4M	fp16	202 MB	26.5 FPS
`base/da3-base_float32.aimodel`	ViT-B	135.4M	fp32	402 MB	23.0 FPS

small · fp16 is the on-device hero — 54 MB, 65 FPS at 504² on an M4 Max, comfortably real-time on iPhone-class GPUs. Each .aimodel is a directory bundle (main.mlirb + metadata.json).

I/O contract

input : image [1, 3, 504, 504]  RGB, raw [0, 1]   (ImageNet normalization is folded into the graph)
output: depth      [1, 504, 504]  relative depth (exp-activated; larger = nearer)
        depth_conf [1, 504, 504]  confidence

Host: resize the RGB image to 504 × 504 (e.g. cv2 INTER_AREA), feed raw [0, 1], run, then resize the depth map back to the original H × W. For display, the DA3 convention is inverse-depth → percentile 2–98 normalize → Spectral colormap.

Fidelity

Bit-exact conversion: the Core AI engine matches the PyTorch reference at cos 1.000000 (≤ ~1e-5 / ~1e-2 per-pixel for fp32 / fp16) on both CPU and GPU, at any fixed input shape.
vs the official DA3 viewer: mean Pearson r ≈ 0.98 across diverse aspect ratios (square inputs r = 1.000) — within DA3's own resolution sensitivity (its 504-vs-518 outputs differ by r ≈ 0.975–0.984).

Usage (CoreAIKit / coreai.runtime)

import coreai.runtime as rt, numpy as np
from PIL import Image

m = await rt.AIModel.load("small/da3-small_float16.aimodel",
        rt.SpecializationOptions.from_preferred_compute_unit_kind(rt.ComputeUnitKind.gpu()))
fn = m.load_function("main")

img = np.asarray(Image.open("photo.jpg").convert("RGB").resize((504, 504)))
x = (img.astype(np.float16) / 255.0).transpose(2, 0, 1)[None]   # raw [0,1], NCHW
depth = (await fn({"image": rt.NDArray(x)}))["depth"].numpy().reshape(504, 504)

Model tree for mlboydaisuke/Depth-Anything-3-CoreAI

Base model

depth-anything/DA3-BASE

Finetuned

(1)

this model