mlboydaisuke's picture
Mirror of mlboydaisuke/Depth-Anything-3-CoreAI
401b639 verified
|
Raw
History Blame Contribute Delete
5.79 kB
---
license: apache-2.0
tags:
- depth-estimation
- monocular-depth
- core-ai
- coreai
- apple
- on-device
- depth-anything
pipeline_tag: depth-estimation
base_model:
- depth-anything/DA3-SMALL
- depth-anything/DA3-BASE
library_name: coreai
---
> **Mirror** of [`mlboydaisuke/Depth-Anything-3-CoreAI`](https://huggingface.co/mlboydaisuke/Depth-Anything-3-CoreAI) β€” the canonical repo ([CoreAI Model Zoo](https://github.com/john-rocky/coreai-model-zoo)). Updates land there first.
# Depth Anything 3 β€” Core AI
**The [coreai-model-zoo](https://github.com/john-rocky/coreai-model-zoo)'s first depth model.**
Monocular (single-image) **relative depth** estimation running fully on-device on Apple's Core AI
runtime, as a single static `.aimodel`. A conversion of ByteDance's
[Depth Anything 3](https://github.com/ByteDance-Seed/depth-anything-3)
([`depth-anything/DA3-SMALL`](https://huggingface.co/depth-anything/DA3-SMALL) /
[`DA3-BASE`](https://huggingface.co/depth-anything/DA3-BASE), Apache-2.0): a DINOv2 ViT backbone +
DPT-style head. Drop in an RGB image, get a depth map (and a confidence map). No NMS, no sampling β€”
host post-processing is just a colormap.
<!-- gen-cards:use-it begin id=depth-anything-3-small (managed by scripts/gen-cards β€” edit cards.json / QuickStart.swift, not this block) -->
## Use it
▢️ **Run it (source)** β€” the [DepthCamera runner](https://github.com/john-rocky/coreai-kit/tree/main/Examples/DepthCamera)
(live camera depth, one app for every depth model in the catalog):
```bash
git clone https://github.com/john-rocky/coreai-kit
open coreai-kit/Examples/DepthCamera/DepthCamera.xcodeproj
# β†’ Run, then pick "Depth Anything 3 Small" in the model picker
# agents / headless (macOS):
cd coreai-kit/Examples/DepthCamera
swift run depth-cli --model depth-anything-3-small --image sample.jpg --output depth.png
```
πŸ’» **Build with it** β€” complete; the glue is kit API, copy-paste runs:
```swift
import CoreAIKitVision
let estimator = try await DepthEstimator(catalog: "depth-anything-3-small")
let image = try ImageFile.load(imageURL) // any image file β†’ CGImage + EXIF orientation
let depth = try await estimator.estimateDepth(for: image.cgImage)
// depth: DepthMap β€” .cgImage() renders it, .values are the raw floats
```
The take-home is [`Examples/DepthCamera/Sources/QuickStart.swift`](https://github.com/john-rocky/coreai-kit/blob/main/Examples/DepthCamera/Sources/QuickStart.swift)
β€” this exact code as one typed function, no UI; the CLI is an argument shell over it, and
the GUI runs the same estimator on every camera frame (`CameraFeed`, ~10 lines).
Live camera? `CameraFeed` (kit API) streams frames β€” feed each one to
`estimateDepth(for:)`; the camera permission prompt is your app's own chrome.
**Integration checklist**
- SPM: `https://github.com/john-rocky/coreai-kit` β†’ product **CoreAIKitVision**
- Info.plist: `NSCameraUsageDescription` β€” only for the live camera; the snippet needs none
- Entitlements: none needed
- First run downloads the model β€” 0.1 GB (Mac) / 0.1 GB (iPhone) β€” then it loads from the
local cache (Application Support; progress via the `downloadProgress` callback)
- Measure in Release β€” Debug is ~3Γ— slower on per-token host work
<!-- gen-cards:use-it end -->
## Bundles
| dir | variant | params | dtype | size | M4 Max GPU |
|---|---|---|---|---|---|
| `small/da3-small_float16.aimodel` | ViT-S | 34.3M | fp16 | **54 MB** | **65.7 FPS** |
| `small/da3-small_float32.aimodel` | ViT-S | 34.3M | fp32 | 105 MB | 56.5 FPS |
| `base/da3-base_float16.aimodel` | ViT-B | 135.4M | fp16 | 202 MB | 26.5 FPS |
| `base/da3-base_float32.aimodel` | ViT-B | 135.4M | fp32 | 402 MB | 23.0 FPS |
`small Β· fp16` is the on-device hero β€” 54 MB, 65 FPS at 504Β² on an M4 Max, comfortably real-time on
iPhone-class GPUs. Each `.aimodel` is a directory bundle (`main.mlirb` + `metadata.json`).
## I/O contract
```
input : image [1, 3, 504, 504] RGB, raw [0, 1] (ImageNet normalization is folded into the graph)
output: depth [1, 504, 504] relative depth (exp-activated; larger = nearer)
depth_conf [1, 504, 504] confidence
```
Host: resize the RGB image to 504 Γ— 504 (e.g. cv2 `INTER_AREA`), feed raw [0, 1], run, then resize
the depth map back to the original H Γ— W. For display, the DA3 convention is inverse-depth β†’
percentile 2–98 normalize β†’ `Spectral` colormap.
## Fidelity
- **Bit-exact conversion:** the Core AI engine matches the PyTorch reference at **cos 1.000000** (≀
~1e-5 / ~1e-2 per-pixel for fp32 / fp16) on both CPU and GPU, at any fixed input shape.
- **vs the official DA3 viewer:** **mean Pearson r β‰ˆ 0.98** across diverse aspect ratios (square
inputs r = 1.000) β€” within DA3's own resolution sensitivity (its 504-vs-518 outputs differ by
r β‰ˆ 0.975–0.984).
## Usage (CoreAIKit / coreai.runtime)
```python
import coreai.runtime as rt, numpy as np
from PIL import Image
m = await rt.AIModel.load("small/da3-small_float16.aimodel",
rt.SpecializationOptions.from_preferred_compute_unit_kind(rt.ComputeUnitKind.gpu()))
fn = m.load_function("main")
img = np.asarray(Image.open("photo.jpg").convert("RGB").resize((504, 504)))
x = (img.astype(np.float16) / 255.0).transpose(2, 0, 1)[None] # raw [0,1], NCHW
depth = (await fn({"image": rt.NDArray(x)}))["depth"].numpy().reshape(504, 504)
```
## Links
- Conversion script + model card: [coreai-model-zoo `zoo/depth-anything-3.md`](https://github.com/john-rocky/coreai-model-zoo/blob/main/zoo/depth-anything-3.md)
- Source: [Depth Anything 3](https://github.com/ByteDance-Seed/depth-anything-3) Β· Apache-2.0
---
*On-device ML / Core ML / Core AI model porting β€” get in touch: open an issue on the
[zoo](https://github.com/john-rocky/coreai-model-zoo).*