File size: 20,081 Bytes
18b382b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 |
---
title: Awesome Depth Anything 3
emoji: 🌊
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Metric 3D reconstruction from images/video
---
<div align="center">
# Awesome Depth Anything 3
**Optimized fork of Depth Anything 3 with production-ready features**
[](https://pypi.org/project/awesome-depth-anything-3/)
[](https://www.python.org/)
[](LICENSE)
[](https://github.com/Aedelon/awesome-depth-anything-3/actions)
[](https://colab.research.google.com/github/Aedelon/awesome-depth-anything-3/blob/main/notebooks/da3_tutorial.ipynb)
[](https://huggingface.co/spaces/Aedelon/awesome-depth-anything-3)
[Demo](https://huggingface.co/spaces/Aedelon/awesome-depth-anything-3) · [Tutorial](notebooks/da3_tutorial.ipynb) · [Benchmarks](BENCHMARKS.md) · [Original Paper](https://arxiv.org/abs/2511.10647)
</div>
---
> **This is an optimized fork** of [Depth Anything 3](https://github.com/ByteDance-Seed/Depth-Anything-3) by ByteDance.
> All credit for the model architecture, training, and research goes to the original authors (see [Credits](#-credits) below).
> This fork focuses on **production optimization, developer experience, and ease of deployment**.
## 🚀 What's New in This Fork
| Feature | Description |
|---------|-------------|
| **Model Caching** | ~200x faster model loading after first use |
| **Adaptive Batching** | Automatic batch size optimization based on GPU memory |
| **PyPI Package** | `pip install awesome-depth-anything-3` |
| **CLI Improvements** | Batch processing options, better error handling |
| **Apple Silicon Optimized** | Smart CPU/GPU preprocessing for best MPS performance |
| **Comprehensive Benchmarks** | Detailed performance analysis across devices |
### Performance Improvements
| Metric | Upstream | This Fork | Improvement |
|--------|----------|-----------|-------------|
| Cached model load | ~1s | ~5ms | **200x faster** |
| Batch 4 inference (MPS) | 3.32 img/s | 3.78 img/s | **1.14x faster** |
| Cold model load | 1.28s | 0.77s | **1.7x faster** |
---
<div align="center">
## Original Depth Anything 3
<h3>Recovering the Visual Space from Any Views</h3>
[**Haotong Lin**](https://haotongl.github.io/)<sup>*</sup> · [**Sili Chen**](https://github.com/SiliChen321)<sup>*</sup> · [**Jun Hao Liew**](https://liewjunhao.github.io/)<sup>*</sup> · [**Donny Y. Chen**](https://donydchen.github.io)<sup>*</sup> · [**Zhenyu Li**](https://zhyever.github.io/) · [**Guang Shi**](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) · [**Jiashi Feng**](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en)
<br>
[**Bingyi Kang**](https://bingykang.github.io/)<sup>*†</sup>
†project lead *Equal Contribution
<a href="https://arxiv.org/abs/2511.10647"><img src='https://img.shields.io/badge/arXiv-Depth Anything 3-red' alt='Paper PDF'></a>
<a href='https://depth-anything-3.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything 3-green' alt='Project Page'></a>
<a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-3'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Official Demo-blue'></a>
</div>
This work presents **Depth Anything 3 (DA3)**, a model that predicts spatially consistent geometry from
arbitrary visual inputs, with or without known camera poses.
In pursuit of minimal modeling, DA3 yields two key insights:
- 💎 A **single plain transformer** (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization,
- ✨ A singular **depth-ray representation** obviates the need for complex multi-task learning.
🏆 DA3 significantly outperforms
[DA2](https://github.com/DepthAnything/Depth-Anything-V2) for monocular depth estimation,
and [VGGT](https://github.com/facebookresearch/vggt) for multi-view depth estimation and pose estimation.
All models are trained exclusively on **public academic datasets**.
<!-- <p align="center">
<img src="assets/images/da3_teaser.png" alt="Depth Anything 3" width="100%">
</p> -->
<p align="center">
<img src="assets/images/demo320-2.gif" alt="Depth Anything 3 - Left" width="70%">
</p>
<p align="center">
<img src="assets/images/da3_radar.png" alt="Depth Anything 3" width="100%">
</p>
## 📰 News
- **30-11-2025:** Add [`use_ray_pose`](#use-ray-pose) and [`ref_view_strategy`](docs/funcs/ref_view_strategy.md) (reference view selection for multi-view inputs).
- **25-11-2025:** Add [Awesome DA3 Projects](#-awesome-da3-projects), a community-driven section featuring DA3-based applications.
- **14-11-2025:** Paper, project page, code and models are all released.
## ✨ Highlights
### 🏆 Model Zoo
We release three series of models, each tailored for specific use cases in visual geometry.
- 🌟 **DA3 Main Series** (`DA3-Giant`, `DA3-Large`, `DA3-Base`, `DA3-Small`) These are our flagship foundation models, trained with a unified depth-ray representation. By varying the input configuration, a single model can perform a wide range of tasks:
+ 🌊 **Monocular Depth Estimation**: Predicts a depth map from a single RGB image.
+ 🌊 **Multi-View Depth Estimation**: Generates consistent depth maps from multiple images for high-quality fusion.
+ 🎯 **Pose-Conditioned Depth Estimation**: Achieves superior depth consistency when camera poses are provided as input.
+ 📷 **Camera Pose Estimation**: Estimates camera extrinsics and intrinsics from one or more images.
+ 🟡 **3D Gaussian Estimation**: Directly predicts 3D Gaussians, enabling high-fidelity novel view synthesis.
- 📐 **DA3 Metric Series** (`DA3Metric-Large`) A specialized model fine-tuned for metric depth estimation in monocular settings, ideal for applications requiring real-world scale.
- 🔍 **DA3 Monocular Series** (`DA3Mono-Large`). A dedicated model for high-quality relative monocular depth estimation. Unlike disparity-based models (e.g., [Depth Anything 2](https://github.com/DepthAnything/Depth-Anything-V2)), it directly predicts depth, resulting in superior geometric accuracy.
🔗 Leveraging these available models, we developed a **nested series** (`DA3Nested-Giant-Large`). This series combines a any-view giant model with a metric model to reconstruct visual geometry at a real-world metric scale.
### 🛠️ Codebase Features
Our repository is designed to be a powerful and user-friendly toolkit for both practical application and future research.
- 🎨 **Interactive Web UI & Gallery**: Visualize model outputs and compare results with an easy-to-use Gradio-based web interface.
- ⚡ **Flexible Command-Line Interface (CLI)**: Powerful and scriptable CLI for batch processing and integration into custom workflows.
- 💾 **Multiple Export Formats**: Save your results in various formats, including `glb`, `npz`, depth images, `ply`, 3DGS videos, etc, to seamlessly connect with other tools.
- 🔧 **Extensible and Modular Design**: The codebase is structured to facilitate future research and the integration of new models or functionalities.
<!-- ### 🎯 Visual Geometry Benchmark
We introduce a new benchmark to rigorously evaluate geometry prediction models on three key tasks: pose estimation, 3D reconstruction, and visual rendering (novel view synthesis) quality.
- 🔄 **Broad Model Compatibility**: Our benchmark is designed to be versatile, supporting the evaluation of various models, including both monocular and multi-view depth estimation approaches.
- 🔬 **Robust Evaluation Pipeline**: We provide a standardized pipeline featuring RANSAC-based pose alignment, TSDF fusion for dense reconstruction, and a principled view selection strategy for novel view synthesis.
- 📊 **Standardized Metrics**: Performance is measured using established metrics: AUC for pose accuracy, F1-score and Chamfer Distance for reconstruction, and PSNR/SSIM/LPIPS for rendering quality.
- 🌍 **Diverse and Challenging Datasets**: The benchmark spans a wide range of scenes from datasets like HiRoom, ETH3D, DTU, 7Scenes, ScanNet++, DL3DV, Tanks and Temples, and MegaDepth. -->
## 🚀 Quick Start
### 📦 Installation
```bash
# From PyPI (recommended)
pip install awesome-depth-anything-3
# With Gradio web UI
pip install awesome-depth-anything-3[app]
# With CUDA optimizations (xformers + gsplat)
pip install awesome-depth-anything-3[cuda]
# Everything
pip install awesome-depth-anything-3[all]
```
<details>
<summary><b>Development installation</b></summary>
```bash
git clone https://github.com/Aedelon/awesome-depth-anything-3.git
cd awesome-depth-anything-3
pip install -e ".[dev]"
# Optional: 3D Gaussian Splatting head
pip install --no-build-isolation git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf
```
</details>
For detailed model information, please refer to the [Model Cards](#-model-cards) section below.
### 💻 Basic Usage
```python
import glob, os, torch
from depth_anything_3.api import DepthAnything3
device = torch.device("cuda")
model = DepthAnything3.from_pretrained("depth-anything/DA3NESTED-GIANT-LARGE")
model = model.to(device=device)
example_path = "assets/examples/SOH"
images = sorted(glob.glob(os.path.join(example_path, "*.png")))
prediction = model.inference(
images,
)
# prediction.processed_images : [N, H, W, 3] uint8 array
print(prediction.processed_images.shape)
# prediction.depth : [N, H, W] float32 array
print(prediction.depth.shape)
# prediction.conf : [N, H, W] float32 array
print(prediction.conf.shape)
# prediction.extrinsics : [N, 3, 4] float32 array # opencv w2c or colmap format
print(prediction.extrinsics.shape)
# prediction.intrinsics : [N, 3, 3] float32 array
print(prediction.intrinsics.shape)
```
```bash
export MODEL_DIR=depth-anything/DA3NESTED-GIANT-LARGE
# This can be a Hugging Face repository or a local directory
# If you encounter network issues, consider using the following mirror: export HF_ENDPOINT=https://hf-mirror.com
# Alternatively, you can download the model directly from Hugging Face
export GALLERY_DIR=workspace/gallery
mkdir -p $GALLERY_DIR
# CLI auto mode with backend reuse
da3 backend --model-dir ${MODEL_DIR} --gallery-dir ${GALLERY_DIR} # Cache model to gpu
da3 auto assets/examples/SOH \
--export-format glb \
--export-dir ${GALLERY_DIR}/TEST_BACKEND/SOH \
--use-backend
# CLI video processing with feature visualization
da3 video assets/examples/robot_unitree.mp4 \
--fps 15 \
--use-backend \
--export-dir ${GALLERY_DIR}/TEST_BACKEND/robo \
--export-format glb-feat_vis \
--feat-vis-fps 15 \
--process-res-method lower_bound_resize \
--export-feat "11,21,31"
# CLI auto mode without backend reuse
da3 auto assets/examples/SOH \
--export-format glb \
--export-dir ${GALLERY_DIR}/TEST_CLI/SOH \
--model-dir ${MODEL_DIR}
```
The model architecture is defined in [`DepthAnything3Net`](src/depth_anything_3/model/da3.py), and specified with a Yaml config file located at [`src/depth_anything_3/configs`](src/depth_anything_3/configs). The input and output processing are handled by [`DepthAnything3`](src/depth_anything_3/api.py). To customize the model architecture, simply create a new config file (*e.g.*, `path/to/new/config`) as:
```yaml
__object__:
path: depth_anything_3.model.da3
name: DepthAnything3Net
args: as_params
net:
__object__:
path: depth_anything_3.model.dinov2.dinov2
name: DinoV2
args: as_params
name: vitb
out_layers: [5, 7, 9, 11]
alt_start: 4
qknorm_start: 4
rope_start: 4
cat_token: True
head:
__object__:
path: depth_anything_3.model.dualdpt
name: DualDPT
args: as_params
dim_in: &head_dim_in 1536
output_dim: 2
features: &head_features 128
out_channels: &head_out_channels [96, 192, 384, 768]
```
Then, the model can be created with the following code snippet.
```python
from depth_anything_3.cfg import create_object, load_config
Model = create_object(load_config("path/to/new/config"))
```
## 📚 Useful Documentation
- 🖥️ [Command Line Interface](docs/CLI.md)
- 📑 [Python API](docs/API.md)
<!-- - 🏁 [Visual Geometry Benchmark](docs/BENCHMARK.md) -->
## 🗂️ Model Cards
Generally, you should observe that DA3-LARGE achieves comparable results to VGGT.
The Nested series uses an Any-view model to estimate pose and depth, and a monocular metric depth estimator for scaling.
| 🗃️ Model Name | 📏 Params | 📊 Rel. Depth | 📷 Pose Est. | 🧭 Pose Cond. | 🎨 GS | 📐 Met. Depth | ☁️ Sky Seg | 📄 License |
|-------------------------------|-----------|---------------|--------------|---------------|-------|---------------|-----------|----------------|
| **Nested** | | | | | | | | |
| [DA3NESTED-GIANT-LARGE](https://huggingface.co/depth-anything/DA3NESTED-GIANT-LARGE) | 1.40B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | CC BY-NC 4.0 |
| **Any-view Model** | | | | | | | | |
| [DA3-GIANT](https://huggingface.co/depth-anything/DA3-GIANT) | 1.15B | ✅ | ✅ | ✅ | ✅ | | | CC BY-NC 4.0 |
| [DA3-LARGE](https://huggingface.co/depth-anything/DA3-LARGE) | 0.35B | ✅ | ✅ | ✅ | | | | CC BY-NC 4.0 |
| [DA3-BASE](https://huggingface.co/depth-anything/DA3-BASE) | 0.12B | ✅ | ✅ | ✅ | | | | Apache 2.0 |
| [DA3-SMALL](https://huggingface.co/depth-anything/DA3-SMALL) | 0.08B | ✅ | ✅ | ✅ | | | | Apache 2.0 |
| | | | | | | | | |
| **Monocular Metric Depth** | | | | | | | | |
| [DA3METRIC-LARGE](https://huggingface.co/depth-anything/DA3METRIC-LARGE) | 0.35B | ✅ | | | | ✅ | ✅ | Apache 2.0 |
| | | | | | | | | |
| **Monocular Depth** | | | | | | | | |
| [DA3MONO-LARGE](https://huggingface.co/depth-anything/DA3MONO-LARGE) | 0.35B | ✅ | | | | | ✅ | Apache 2.0 |
## ⚡ Performance Benchmarks
Inference throughput measured on Apple Silicon (MPS) with PyTorch 2.9.0. For detailed benchmarks, see [BENCHMARKS.md](BENCHMARKS.md).
### Apple Silicon (MPS) - Batch Size 1
| Model | Latency | Throughput |
|-------|---------|------------|
| DA3-Small | 46 ms | **22 img/s** |
| DA3-Base | 93 ms | **11 img/s** |
| DA3-Large | 265 ms | **3.8 img/s** |
| DA3-Giant | 618 ms | **1.6 img/s** |
### Cross-Device Comparison (DA3-Large)
| Device | Throughput | vs CPU |
|--------|------------|--------|
| CPU | 0.3 img/s | 1.0x |
| Apple Silicon (MPS) | 3.8 img/s | **13x** |
| NVIDIA L4 (CUDA) | 10.3 img/s | **34x** |
### Batch Processing
```python
from depth_anything_3.api import DepthAnything3
model = DepthAnything3.from_pretrained("depth-anything/DA3-LARGE")
# Adaptive batching (recommended for large image sets)
results = model.batch_inference(
images=image_paths,
batch_size="auto", # Automatically selects optimal batch size
target_memory_utilization=0.85,
)
# Fixed batch size
results = model.batch_inference(
images=image_paths,
batch_size=4,
)
```
> See [BENCHMARKS.md](BENCHMARKS.md) for comprehensive benchmarks including preprocessing, attention mechanisms, and adaptive batching strategies.
## ❓ FAQ
- **Monocular Metric Depth**: To obtain metric depth in meters from `DA3METRIC-LARGE`, use `metric_depth = focal * net_output / 300.`, where `focal` is the focal length in pixels (typically the average of fx and fy from the camera intrinsic matrix K). Note that the output from `DA3NESTED-GIANT-LARGE` is already in meters.
- <a id="use-ray-pose"></a>**Ray Head (`use_ray_pose`)**: Our API and CLI support `use_ray_pose` arg, which means that the model will derive camera pose from ray head, which is generally slightly slower, but more accurate. Note that the default is `False` for faster inference speed.
<details>
<summary>AUC3 Results for DA3NESTED-GIANT-LARGE</summary>
| Model | HiRoom | ETH3D | DTU | 7Scenes | ScanNet++ |
|-------|------|-------|-----|---------|-----------|
| `ray_head` | 84.4 | 52.6 | 93.9 | 29.5 | 89.4 |
| `cam_head` | 80.3 | 48.4 | 94.1 | 28.5 | 85.0 |
</details>
- **Older GPUs without XFormers support**: See [Issue #11](https://github.com/ByteDance-Seed/Depth-Anything-3/issues/11). Thanks to [@S-Mahoney](https://github.com/S-Mahoney) for the solution!
## 🏢 Awesome DA3 Projects
A community-curated list of Depth Anything 3 integrations across 3D tools, creative pipelines, robotics, and web/VR viewers, including but not limited to these. You are welcome to submit your DA3-based project via PR, and we will review and feature it if applicable.
- [DA3-blender](https://github.com/xy-gao/DA3-blender): Blender addon for DA3-based 3D reconstruction from a set of images.
- [ComfyUI-DepthAnythingV3](https://github.com/PozzettiAndrea/ComfyUI-DepthAnythingV3): ComfyUI nodes for Depth Anything 3, supporting single/multi-view and video-consistent depth with optional point‑cloud export.
- [DA3-ROS2-Wrapper](https://github.com/GerdsenAI/GerdsenAI-Depth-Anything-3-ROS2-Wrapper): Real-time DA3 depth in ROS2 with multi-camera support.
- [VideoDepthViewer3D](https://github.com/amariichi/VideoDepthViewer3D): Streaming videos with DA3 metric depth to a Three.js/WebXR 3D viewer for VR/stereo playback.
## 📝 Credits
### Original Authors
This package is built on top of **Depth Anything 3**, created by the ByteDance Seed team:
- [Haotong Lin](https://haotongl.github.io/), [Sili Chen](https://github.com/SiliChen321), [Jun Hao Liew](https://liewjunhao.github.io/), [Donny Y. Chen](https://donydchen.github.io), [Zhenyu Li](https://zhyever.github.io/), [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ), [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ), [Bingyi Kang](https://bingykang.github.io/)
All model weights, architecture, and core algorithms are their work. This fork only adds production optimizations and deployment tooling.
### Fork Maintainer
This optimized fork is maintained by [Delanoe Pirard (Aedelon)](https://github.com/Aedelon).
Contributions:
- Model caching system
- Adaptive batching
- Apple Silicon (MPS) optimizations
- PyPI packaging and CI/CD
- Comprehensive benchmarking
### Citation
If you use Depth Anything 3 in your research, please cite the original paper:
```bibtex
@article{depthanything3,
title={Depth Anything 3: Recovering the visual space from any views},
author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang},
journal={arXiv preprint arXiv:2511.10647},
year={2025}
}
```
If you specifically use features from this fork (caching, batching, MPS optimizations), you may additionally reference:
```
awesome-depth-anything-3: https://github.com/Aedelon/awesome-depth-anything-3
```
|