depth-anything
/

DA3NESTED-GIANT-LARGE

+---
+license: cc-by-nc-4.0
+tags:
+- depth-estimation
+- computer-vision
+- monocular-depth
+- multi-view-geometry
+- pose-estimation
+library_name: depth-anything-3
+pipeline_tag: depth-estimation
+---
+# Depth Anything 3: DA3NESTED-GIANT-LARGE
+<div align="center">
+[![Project Page](https://img.shields.io/badge/Project_Page-Depth_Anything_3-green)](https://depth-anything-3.github.io)
+[![Paper](https://img.shields.io/badge/arXiv-Depth_Anything_3-red)](https://arxiv.org/abs/)
+[![Demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue)](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
+[![Benchmark](https://img.shields.io/badge/Benchmark-VisGeo-yellow)](https://huggingface.co/datasets/depth-anything/VGB)
+</div>
+## Model Description
+DA3 Nested model combining the any-view Giant model with the metric Large model for metric-scale visual geometry reconstruction. This is our recommended model that combines all capabilities.
+| Property | Value |
+|----------|-------|
+| **Model Series** | Nested |
+| **Parameters** | 1.40B |
+| **License** | CC BY-NC 4.0 |
+⚠️ **Non-commercial use only** due to CC BY-NC 4.0 license.
+## Capabilities
+- ✅ Relative Depth
+- ✅ Pose Estimation
+- ✅ Pose Conditioning
+- ✅ 3D Gaussians
+- ✅ Metric Depth
+- ✅ Sky Segmentation
+## Quick Start
+### Installation
+```bash
+pip install depth-anything-3
+```
+### Basic Example
+```python
+import torch
+from depth_anything_3.api import DepthAnything3
+# Load model from Hugging Face Hub
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = DepthAnything3.from_pretrained("depth-anything/da3nested-giant-large")
+model = model.to(device=device)
+# Run inference on images
+images = ["image1.jpg", "image2.jpg"]  # List of image paths, PIL Images, or numpy arrays
+prediction = model.inference(
+    images,
+    export_dir="output",
+    export_format="glb"  # Options: glb, npz, ply, mini_npz, gs_ply, gs_video
+)
+# Access results
+print(prediction.depth.shape)        # Depth maps: [N, H, W] float32
+print(prediction.conf.shape)         # Confidence maps: [N, H, W] float32
+print(prediction.extrinsics.shape)   # Camera poses (w2c): [N, 3, 4] float32
+print(prediction.intrinsics.shape)   # Camera intrinsics: [N, 3, 3] float32
+```
+### Command Line Interface
+```bash
+# Process images with auto mode
+da3 auto path/to/images \
+    --export-format glb \
+    --export-dir output \
+    --model-dir depth-anything/da3nested-giant-large
+# Use backend for faster repeated inference
+da3 backend --model-dir depth-anything/da3nested-giant-large
+da3 auto path/to/images --export-format glb --use-backend
+```
+## Model Details
+- **Developed by:** ByteDance Seed Team
+- **Model Type:** Vision Transformer for Visual Geometry
+- **Architecture:** Plain transformer with unified depth-ray representation
+- **Training Data:** Public academic datasets only
+### Key Insights
+💎 A **single plain transformer** (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization.
+✨ A singular **depth-ray representation** obviates the need for complex multi-task learning.
+## Performance
+🏆 Depth Anything 3 significantly outperforms:
+- **Depth Anything 2** for monocular depth estimation
+- **VGGT** for multi-view depth estimation and pose estimation
+For detailed benchmarks, please refer to our [paper](https://depth-anything-3.github.io) and [Visual Geometry Benchmark](https://huggingface.co/datasets/depth-anything/VGB).
+## Limitations
+- The model is trained on academic datasets and may have limitations on certain domain-specific images
+- Performance may vary depending on image quality, lighting conditions, and scene complexity
+- ⚠️ **Non-commercial use only** due to CC BY-NC 4.0 license.
+## Citation
+If you find Depth Anything 3 useful in your research or projects, please cite:
+```bibtex
+@article{depthanything3,
+  title={Depth Anything 3: Recovering the visual space from any views},
+  author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang},
+  journal={arXiv preprint arXiv:XXXX.XXXXX},
+  year={2025}
+}
+```
+## Links
+- 🏠 [Project Page](https://depth-anything-3.github.io)
+- 📄 [Paper](https://arxiv.org/abs/)
+- 💻 [GitHub Repository](https://github.com/ByteDance-Seed/Depth-Anything-3)
+- 🤗 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
+- 📊 [Visual Geometry Benchmark](https://huggingface.co/datasets/depth-anything/VGB)
+- 📚 [Documentation](https://github.com/ByteDance-Seed/Depth-Anything-3#-useful-documentation)
+## Authors
+[Haotong Lin](https://haotongl.github.io/) · [Sili Chen](https://github.com/SiliChen321) · [Junhao Liew](https://liewjunhao.github.io/) · [Donny Y. Chen](https://donydchen.github.io) · [Zhenyu Li](https://zhyever.github.io/) · [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) · [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) · [Bingyi Kang](https://bingykang.github.io/)

config.json ADDED Viewed

	@@ -0,0 +1,137 @@

+{
+  "model_name": "da3nested-giant-large",
+  "config": {
+    "__object__": {
+      "path": "depth_anything_3.model.da3",
+      "name": "NestedDepthAnything3Net",
+      "args": "as_params"
+    },
+    "anyview": {
+      "__object__": {
+        "path": "depth_anything_3.model.da3",
+        "name": "DepthAnything3Net",
+        "args": "as_params"
+      },
+      "net": {
+        "__object__": {
+          "path": "depth_anything_3.model.dinov2.dinov2",
+          "name": "DinoV2",
+          "args": "as_params"
+        },
+        "name": "vitg",
+        "out_layers": [
+          19,
+          27,
+          33,
+          39
+        ],
+        "alt_start": 13,
+        "qknorm_start": 13,
+        "rope_start": 13,
+        "cat_token": true
+      },
+      "head": {
+        "__object__": {
+          "path": "depth_anything_3.model.dualdpt",
+          "name": "DualDPT",
+          "args": "as_params"
+        },
+        "dim_in": 3072,
+        "output_dim": 2,
+        "features": 256,
+        "out_channels": [
+          256,
+          512,
+          1024,
+          1024
+        ]
+      },
+      "cam_enc": {
+        "__object__": {
+          "path": "depth_anything_3.model.cam_enc",
+          "name": "CameraEnc",
+          "args": "as_params"
+        },
+        "dim_out": 1536
+      },
+      "cam_dec": {
+        "__object__": {
+          "path": "depth_anything_3.model.cam_dec",
+          "name": "CameraDec",
+          "args": "as_params"
+        },
+        "dim_in": 3072
+      },
+      "gs_head": {
+        "__object__": {
+          "path": "depth_anything_3.model.gsdpt",
+          "name": "GSDPT",
+          "args": "as_params"
+        },
+        "dim_in": 3072,
+        "output_dim": 38,
+        "features": 256,
+        "out_channels": [
+          256,
+          512,
+          1024,
+          1024
+        ]
+      },
+      "gs_adapter": {
+        "__object__": {
+          "path": "depth_anything_3.model.gs_adapter",
+          "name": "GaussianAdapter",
+          "args": "as_params"
+        },
+        "sh_degree": 2,
+        "pred_color": false,
+        "pred_offset_depth": true,
+        "pred_offset_xy": true,
+        "gaussian_scale_min": 1e-05,
+        "gaussian_scale_max": 30.0
+      }
+    },
+    "metric": {
+      "__object__": {
+        "path": "depth_anything_3.model.da3",
+        "name": "DepthAnything3Net",
+        "args": "as_params"
+      },
+      "net": {
+        "__object__": {
+          "path": "depth_anything_3.model.dinov2.dinov2",
+          "name": "DinoV2",
+          "args": "as_params"
+        },
+        "name": "vitl",
+        "out_layers": [
+          4,
+          11,
+          17,
+          23
+        ],
+        "alt_start": -1,
+        "qknorm_start": -1,
+        "rope_start": -1,
+        "cat_token": false
+      },
+      "head": {
+        "__object__": {
+          "path": "depth_anything_3.model.dpt",
+          "name": "DPT",
+          "args": "as_params"
+        },
+        "dim_in": 1024,
+        "output_dim": 1,
+        "features": 256,
+        "out_channels": [
+          256,
+          512,
+          1024,
+          1024
+        ]
+      }
+    }
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8899faf998dedbc230261ab736fa57015280727399429122d44d4f9e7aac2ddd
+size 6759558100