| --- |
| license: apache-2.0 |
| tags: |
| - depth-estimation |
| - monocular-depth |
| - core-ai |
| - coreai |
| - apple |
| - on-device |
| - depth-anything |
| pipeline_tag: depth-estimation |
| base_model: |
| - depth-anything/DA3-SMALL |
| - depth-anything/DA3-BASE |
| library_name: coreai |
| --- |
| |
| > **Mirror** of [`mlboydaisuke/Depth-Anything-3-CoreAI`](https://huggingface.co/mlboydaisuke/Depth-Anything-3-CoreAI) β the canonical repo ([CoreAI Model Zoo](https://github.com/john-rocky/coreai-model-zoo)). Updates land there first. |
|
|
|
|
| # Depth Anything 3 β Core AI |
|
|
| **The [coreai-model-zoo](https://github.com/john-rocky/coreai-model-zoo)'s first depth model.** |
| Monocular (single-image) **relative depth** estimation running fully on-device on Apple's Core AI |
| runtime, as a single static `.aimodel`. A conversion of ByteDance's |
| [Depth Anything 3](https://github.com/ByteDance-Seed/depth-anything-3) |
| ([`depth-anything/DA3-SMALL`](https://huggingface.co/depth-anything/DA3-SMALL) / |
| [`DA3-BASE`](https://huggingface.co/depth-anything/DA3-BASE), Apache-2.0): a DINOv2 ViT backbone + |
| DPT-style head. Drop in an RGB image, get a depth map (and a confidence map). No NMS, no sampling β |
| host post-processing is just a colormap. |
|
|
| <!-- gen-cards:use-it begin id=depth-anything-3-small (managed by scripts/gen-cards β edit cards.json / QuickStart.swift, not this block) --> |
| ## Use it |
|
|
| βΆοΈ **Run it (source)** β the [DepthCamera runner](https://github.com/john-rocky/coreai-kit/tree/main/Examples/DepthCamera) |
| (live camera depth, one app for every depth model in the catalog): |
|
|
| ```bash |
| git clone https://github.com/john-rocky/coreai-kit |
| open coreai-kit/Examples/DepthCamera/DepthCamera.xcodeproj |
| # β Run, then pick "Depth Anything 3 Small" in the model picker |
| |
| # agents / headless (macOS): |
| cd coreai-kit/Examples/DepthCamera |
| swift run depth-cli --model depth-anything-3-small --image sample.jpg --output depth.png |
| ``` |
|
|
| π» **Build with it** β complete; the glue is kit API, copy-paste runs: |
|
|
| ```swift |
| import CoreAIKitVision |
| |
| let estimator = try await DepthEstimator(catalog: "depth-anything-3-small") |
| let image = try ImageFile.load(imageURL) // any image file β CGImage + EXIF orientation |
| let depth = try await estimator.estimateDepth(for: image.cgImage) |
| // depth: DepthMap β .cgImage() renders it, .values are the raw floats |
| ``` |
|
|
| The take-home is [`Examples/DepthCamera/Sources/QuickStart.swift`](https://github.com/john-rocky/coreai-kit/blob/main/Examples/DepthCamera/Sources/QuickStart.swift) |
| β this exact code as one typed function, no UI; the CLI is an argument shell over it, and |
| the GUI runs the same estimator on every camera frame (`CameraFeed`, ~10 lines). |
| Live camera? `CameraFeed` (kit API) streams frames β feed each one to |
| `estimateDepth(for:)`; the camera permission prompt is your app's own chrome. |
|
|
| **Integration checklist** |
|
|
| - SPM: `https://github.com/john-rocky/coreai-kit` β product **CoreAIKitVision** |
| - Info.plist: `NSCameraUsageDescription` β only for the live camera; the snippet needs none |
| - Entitlements: none needed |
| - First run downloads the model β 0.1 GB (Mac) / 0.1 GB (iPhone) β then it loads from the |
| local cache (Application Support; progress via the `downloadProgress` callback) |
| - Measure in Release β Debug is ~3Γ slower on per-token host work |
| <!-- gen-cards:use-it end --> |
|
|
| ## Bundles |
|
|
| | dir | variant | params | dtype | size | M4 Max GPU | |
| |---|---|---|---|---|---| |
| | `small/da3-small_float16.aimodel` | ViT-S | 34.3M | fp16 | **54 MB** | **65.7 FPS** | |
| | `small/da3-small_float32.aimodel` | ViT-S | 34.3M | fp32 | 105 MB | 56.5 FPS | |
| | `base/da3-base_float16.aimodel` | ViT-B | 135.4M | fp16 | 202 MB | 26.5 FPS | |
| | `base/da3-base_float32.aimodel` | ViT-B | 135.4M | fp32 | 402 MB | 23.0 FPS | |
|
|
| `small Β· fp16` is the on-device hero β 54 MB, 65 FPS at 504Β² on an M4 Max, comfortably real-time on |
| iPhone-class GPUs. Each `.aimodel` is a directory bundle (`main.mlirb` + `metadata.json`). |
|
|
| ## I/O contract |
|
|
| ``` |
| input : image [1, 3, 504, 504] RGB, raw [0, 1] (ImageNet normalization is folded into the graph) |
| output: depth [1, 504, 504] relative depth (exp-activated; larger = nearer) |
| depth_conf [1, 504, 504] confidence |
| ``` |
|
|
| Host: resize the RGB image to 504 Γ 504 (e.g. cv2 `INTER_AREA`), feed raw [0, 1], run, then resize |
| the depth map back to the original H Γ W. For display, the DA3 convention is inverse-depth β |
| percentile 2β98 normalize β `Spectral` colormap. |
|
|
| ## Fidelity |
|
|
| - **Bit-exact conversion:** the Core AI engine matches the PyTorch reference at **cos 1.000000** (β€ |
| ~1e-5 / ~1e-2 per-pixel for fp32 / fp16) on both CPU and GPU, at any fixed input shape. |
| - **vs the official DA3 viewer:** **mean Pearson r β 0.98** across diverse aspect ratios (square |
| inputs r = 1.000) β within DA3's own resolution sensitivity (its 504-vs-518 outputs differ by |
| r β 0.975β0.984). |
|
|
| ## Usage (CoreAIKit / coreai.runtime) |
|
|
| ```python |
| import coreai.runtime as rt, numpy as np |
| from PIL import Image |
| |
| m = await rt.AIModel.load("small/da3-small_float16.aimodel", |
| rt.SpecializationOptions.from_preferred_compute_unit_kind(rt.ComputeUnitKind.gpu())) |
| fn = m.load_function("main") |
| |
| img = np.asarray(Image.open("photo.jpg").convert("RGB").resize((504, 504))) |
| x = (img.astype(np.float16) / 255.0).transpose(2, 0, 1)[None] # raw [0,1], NCHW |
| depth = (await fn({"image": rt.NDArray(x)}))["depth"].numpy().reshape(504, 504) |
| ``` |
|
|
| ## Links |
|
|
| - Conversion script + model card: [coreai-model-zoo `zoo/depth-anything-3.md`](https://github.com/john-rocky/coreai-model-zoo/blob/main/zoo/depth-anything-3.md) |
| - Source: [Depth Anything 3](https://github.com/ByteDance-Seed/depth-anything-3) Β· Apache-2.0 |
|
|
| --- |
|
|
| *On-device ML / Core ML / Core AI model porting β get in touch: open an issue on the |
| [zoo](https://github.com/john-rocky/coreai-model-zoo).* |
|
|