Depth Anything 3 Base, 4-View β Depth + Camera Pose Recovery (ONNX)
Multi-view ONNX export of depth-anything/DA3-BASE: the same Apache-2.0 any-view checkpoint as Heliosoph/da3-base-onnx, traced with a 4-frame window so the cross-view attention β and therefore the pose head β actually functions. Feed 4 views of a scene in one call; get per-view depth, confidence, camera intrinsics, and relative camera poses (first view is the reference).
Why a fixed view count. ONNX tracing bakes the view count into the cross-view attention reshapes. This is not cosmetic: a graph traced at V=2 runs at V=4 but produces silently wrong numbers (~5e-1 relative error, measured). So the views axis is pinned at 4 and onnxruntime rejects any other count at the input β for a different window size, re-run the conversion script with -Views N. Pose is only defined within a window; stitch longer sequences with overlapping windows.
Scale. Depth and pose translations share one unknown global scale (the standard any-view ambiguity). Shapes and relative geometry are right; absolute size isn't. To land in real meters, anchor against a metric estimator on the same frames β e.g. Heliosoph/da3metric-large-onnx: s = median(d_metric / d_base) over confidence-gated pixels, then scale the translations by s (rotations and K unchanged).
Provenance, toolchain, and exporter workarounds are identical to the single-view repo (see its card): official depth-anything-3 0.1.1 package, torch 2.4.x, opset 17, fp32 trace, cartesian_prod and scripted-affine_inverse shims. Conversion script: scripts/export-da3metric.ps1 (this export: -Views 4). Validation: fp32 ONNX matches PyTorch to 6.0e-06 max relative error across all four heads at V=4; batch=2 verified item-wise (windows in a batch don't leak into each other's poses); fp16 matches fp32 to β€1.1e-03.
Credit: Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y. Chen, Zhenyu Li, Guang Shi, Jiashi Feng, Bingyi Kang (ByteDance Seed). Paper: "Depth Anything 3: Recovering the Visual Space from Any Views", 2025.
What this repo contains
| File | Variant | Size | Use |
|---|---|---|---|
model.onnx |
fp32 | ~394 MB | Default β matches the PyTorch upstream to ~1e-5. |
model_fp16.onnx |
fp16 | ~198 MB | Half precision, I/O stays fp32 (keep_io_types) β drop-in swap. |
config.json |
β | <1 KB | Upstream DA3 model config (provenance / re-instantiation). |
Input / output
| Spec | |
|---|---|
| Input name | images |
| Input shape | [batch, 4, 3, 504, 504] β exactly 4 views, each preprocessed like a single image |
| Input dtype | float32 (both variants) |
| Preprocessing | per view: RGB, scale to [0,1], ImageNet mean/std |
Output depth |
[batch, 4, 504, 504] β up-to-scale depth per view, bigger = farther |
Output depth_conf |
[batch, 4, 504, 504] β per-pixel confidence per view |
Output extrinsics |
[batch, 4, 3, 4] β per-view [R | t] worldβcamera, poses relative within the window (view 0 β reference) |
Output intrinsics |
[batch, 4, 3, 3] β per-view K at the 504Γ504 grid (rescale via diag(W/504, H/504, 1) Β· K) |
| Dynamic axes | batch only (views and resolution are pinned in the trace) |
How to use
import numpy as np
import onnxruntime as ort
from PIL import Image
MEAN, STD = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
def prep(path):
im = Image.open(path).convert("RGB").resize((504, 504), Image.BILINEAR)
x = np.asarray(im, dtype=np.float32) / 255.0
return ((x - MEAN) / STD).transpose(2, 0, 1)
frames = np.stack([prep(p) for p in ["f0.jpg", "f1.jpg", "f2.jpg", "f3.jpg"]])[None]
frames = frames.astype(np.float32) # [1, 4, 3, 504, 504]
sess = ort.InferenceSession("model.onnx")
depth, conf, ext, K = sess.run(
["depth", "depth_conf", "extrinsics", "intrinsics"], {"images": frames})
# ext[0, v] is the [R|t] of view v relative to the window; unproject each
# view's depth with K[0, v] and transform by the inverse pose to fuse a
# single up-to-scale point cloud. Anchor scale with a metric depth model
# (see the scale note above) to land in meters.
License
Apache-2.0 β same as upstream depth-anything/DA3-BASE. (The DA3 any-view Large/Giant checkpoints are CC-BY-NC 4.0 and are not part of this export.) The ONNX-export step doesn't change licensing β same model, different serialization.
- Downloads last month
- 2
Model tree for Heliosoph/da3-base-4view-onnx
Base model
depth-anything/DA3-BASE