Mirror of mlboydaisuke/Depth-Anything-3-CoreAI β€” the canonical repo (CoreAI Model Zoo). Updates land there first.

Depth Anything 3 β€” Core AI

The coreai-model-zoo's first depth model. Monocular (single-image) relative depth estimation running fully on-device on Apple's Core AI runtime, as a single static .aimodel. A conversion of ByteDance's Depth Anything 3 (depth-anything/DA3-SMALL / DA3-BASE, Apache-2.0): a DINOv2 ViT backbone + DPT-style head. Drop in an RGB image, get a depth map (and a confidence map). No NMS, no sampling β€” host post-processing is just a colormap.

Use it

▢️ Run it (source) β€” the DepthCamera runner (live camera depth, one app for every depth model in the catalog):

git clone https://github.com/john-rocky/coreai-kit
open coreai-kit/Examples/DepthCamera/DepthCamera.xcodeproj
# β†’ Run, then pick "Depth Anything 3 Small" in the model picker

# agents / headless (macOS):
cd coreai-kit/Examples/DepthCamera
swift run depth-cli --model depth-anything-3-small --image sample.jpg --output depth.png

πŸ’» Build with it β€” complete; the glue is kit API, copy-paste runs:

import CoreAIKitVision

let estimator = try await DepthEstimator(catalog: "depth-anything-3-small")
let image = try ImageFile.load(imageURL)  // any image file β†’ CGImage + EXIF orientation
let depth = try await estimator.estimateDepth(for: image.cgImage)
// depth: DepthMap β€” .cgImage() renders it, .values are the raw floats

The take-home is Examples/DepthCamera/Sources/QuickStart.swift β€” this exact code as one typed function, no UI; the CLI is an argument shell over it, and the GUI runs the same estimator on every camera frame (CameraFeed, ~10 lines). Live camera? CameraFeed (kit API) streams frames β€” feed each one to estimateDepth(for:); the camera permission prompt is your app's own chrome.

Integration checklist

  • SPM: https://github.com/john-rocky/coreai-kit β†’ product CoreAIKitVision
  • Info.plist: NSCameraUsageDescription β€” only for the live camera; the snippet needs none
  • Entitlements: none needed
  • First run downloads the model β€” 0.1 GB (Mac) / 0.1 GB (iPhone) β€” then it loads from the local cache (Application Support; progress via the downloadProgress callback)
  • Measure in Release β€” Debug is ~3Γ— slower on per-token host work

Bundles

dir variant params dtype size M4 Max GPU
small/da3-small_float16.aimodel ViT-S 34.3M fp16 54 MB 65.7 FPS
small/da3-small_float32.aimodel ViT-S 34.3M fp32 105 MB 56.5 FPS
base/da3-base_float16.aimodel ViT-B 135.4M fp16 202 MB 26.5 FPS
base/da3-base_float32.aimodel ViT-B 135.4M fp32 402 MB 23.0 FPS

small Β· fp16 is the on-device hero β€” 54 MB, 65 FPS at 504Β² on an M4 Max, comfortably real-time on iPhone-class GPUs. Each .aimodel is a directory bundle (main.mlirb + metadata.json).

I/O contract

input : image [1, 3, 504, 504]  RGB, raw [0, 1]   (ImageNet normalization is folded into the graph)
output: depth      [1, 504, 504]  relative depth (exp-activated; larger = nearer)
        depth_conf [1, 504, 504]  confidence

Host: resize the RGB image to 504 Γ— 504 (e.g. cv2 INTER_AREA), feed raw [0, 1], run, then resize the depth map back to the original H Γ— W. For display, the DA3 convention is inverse-depth β†’ percentile 2–98 normalize β†’ Spectral colormap.

Fidelity

  • Bit-exact conversion: the Core AI engine matches the PyTorch reference at cos 1.000000 (≀ ~1e-5 / ~1e-2 per-pixel for fp32 / fp16) on both CPU and GPU, at any fixed input shape.
  • vs the official DA3 viewer: mean Pearson r β‰ˆ 0.98 across diverse aspect ratios (square inputs r = 1.000) β€” within DA3's own resolution sensitivity (its 504-vs-518 outputs differ by r β‰ˆ 0.975–0.984).

Usage (CoreAIKit / coreai.runtime)

import coreai.runtime as rt, numpy as np
from PIL import Image

m = await rt.AIModel.load("small/da3-small_float16.aimodel",
        rt.SpecializationOptions.from_preferred_compute_unit_kind(rt.ComputeUnitKind.gpu()))
fn = m.load_function("main")

img = np.asarray(Image.open("photo.jpg").convert("RGB").resize((504, 504)))
x = (img.astype(np.float16) / 255.0).transpose(2, 0, 1)[None]   # raw [0,1], NCHW
depth = (await fn({"image": rt.NDArray(x)}))["depth"].numpy().reshape(504, 504)

Links


On-device ML / Core ML / Core AI model porting β€” get in touch: open an issue on the zoo.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for coreai-community/Depth-Anything-3-CoreAI

Finetuned
(2)
this model