haotongl commited on
Commit
8615eef
·
verified ·
1 Parent(s): 4097943

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +144 -3
  2. config.json +137 -0
  3. model.safetensors +3 -0
README.md CHANGED
@@ -1,3 +1,144 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - depth-estimation
5
+ - computer-vision
6
+ - monocular-depth
7
+ - multi-view-geometry
8
+ - pose-estimation
9
+ library_name: depth-anything-3
10
+ pipeline_tag: depth-estimation
11
+ ---
12
+
13
+ # Depth Anything 3: DA3NESTED-GIANT-LARGE
14
+
15
+ <div align="center">
16
+
17
+ [![Project Page](https://img.shields.io/badge/Project_Page-Depth_Anything_3-green)](https://depth-anything-3.github.io)
18
+ [![Paper](https://img.shields.io/badge/arXiv-Depth_Anything_3-red)](https://arxiv.org/abs/)
19
+ [![Demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue)](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
20
+ [![Benchmark](https://img.shields.io/badge/Benchmark-VisGeo-yellow)](https://huggingface.co/datasets/depth-anything/VGB)
21
+
22
+ </div>
23
+
24
+ ## Model Description
25
+
26
+ DA3 Nested model combining the any-view Giant model with the metric Large model for metric-scale visual geometry reconstruction. This is our recommended model that combines all capabilities.
27
+
28
+ | Property | Value |
29
+ |----------|-------|
30
+ | **Model Series** | Nested |
31
+ | **Parameters** | 1.40B |
32
+ | **License** | CC BY-NC 4.0 |
33
+
34
+ ⚠️ **Non-commercial use only** due to CC BY-NC 4.0 license.
35
+
36
+ ## Capabilities
37
+
38
+ - ✅ Relative Depth
39
+ - ✅ Pose Estimation
40
+ - ✅ Pose Conditioning
41
+ - ✅ 3D Gaussians
42
+ - ✅ Metric Depth
43
+ - ✅ Sky Segmentation
44
+
45
+ ## Quick Start
46
+
47
+ ### Installation
48
+
49
+ ```bash
50
+ pip install depth-anything-3
51
+ ```
52
+
53
+ ### Basic Example
54
+
55
+ ```python
56
+ import torch
57
+ from depth_anything_3.api import DepthAnything3
58
+
59
+ # Load model from Hugging Face Hub
60
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
61
+ model = DepthAnything3.from_pretrained("depth-anything/da3nested-giant-large")
62
+ model = model.to(device=device)
63
+
64
+ # Run inference on images
65
+ images = ["image1.jpg", "image2.jpg"] # List of image paths, PIL Images, or numpy arrays
66
+ prediction = model.inference(
67
+ images,
68
+ export_dir="output",
69
+ export_format="glb" # Options: glb, npz, ply, mini_npz, gs_ply, gs_video
70
+ )
71
+
72
+ # Access results
73
+ print(prediction.depth.shape) # Depth maps: [N, H, W] float32
74
+ print(prediction.conf.shape) # Confidence maps: [N, H, W] float32
75
+ print(prediction.extrinsics.shape) # Camera poses (w2c): [N, 3, 4] float32
76
+ print(prediction.intrinsics.shape) # Camera intrinsics: [N, 3, 3] float32
77
+ ```
78
+
79
+ ### Command Line Interface
80
+
81
+ ```bash
82
+ # Process images with auto mode
83
+ da3 auto path/to/images \
84
+ --export-format glb \
85
+ --export-dir output \
86
+ --model-dir depth-anything/da3nested-giant-large
87
+
88
+ # Use backend for faster repeated inference
89
+ da3 backend --model-dir depth-anything/da3nested-giant-large
90
+ da3 auto path/to/images --export-format glb --use-backend
91
+ ```
92
+
93
+ ## Model Details
94
+
95
+ - **Developed by:** ByteDance Seed Team
96
+ - **Model Type:** Vision Transformer for Visual Geometry
97
+ - **Architecture:** Plain transformer with unified depth-ray representation
98
+ - **Training Data:** Public academic datasets only
99
+
100
+ ### Key Insights
101
+
102
+ 💎 A **single plain transformer** (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization.
103
+
104
+ ✨ A singular **depth-ray representation** obviates the need for complex multi-task learning.
105
+
106
+ ## Performance
107
+
108
+ 🏆 Depth Anything 3 significantly outperforms:
109
+ - **Depth Anything 2** for monocular depth estimation
110
+ - **VGGT** for multi-view depth estimation and pose estimation
111
+
112
+ For detailed benchmarks, please refer to our [paper](https://depth-anything-3.github.io) and [Visual Geometry Benchmark](https://huggingface.co/datasets/depth-anything/VGB).
113
+
114
+ ## Limitations
115
+
116
+ - The model is trained on academic datasets and may have limitations on certain domain-specific images
117
+ - Performance may vary depending on image quality, lighting conditions, and scene complexity
118
+ - ⚠️ **Non-commercial use only** due to CC BY-NC 4.0 license.
119
+
120
+ ## Citation
121
+
122
+ If you find Depth Anything 3 useful in your research or projects, please cite:
123
+
124
+ ```bibtex
125
+ @article{depthanything3,
126
+ title={Depth Anything 3: Recovering the visual space from any views},
127
+ author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang},
128
+ journal={arXiv preprint arXiv:XXXX.XXXXX},
129
+ year={2025}
130
+ }
131
+ ```
132
+
133
+ ## Links
134
+
135
+ - 🏠 [Project Page](https://depth-anything-3.github.io)
136
+ - 📄 [Paper](https://arxiv.org/abs/)
137
+ - 💻 [GitHub Repository](https://github.com/ByteDance-Seed/Depth-Anything-3)
138
+ - 🤗 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
139
+ - 📊 [Visual Geometry Benchmark](https://huggingface.co/datasets/depth-anything/VGB)
140
+ - 📚 [Documentation](https://github.com/ByteDance-Seed/Depth-Anything-3#-useful-documentation)
141
+
142
+ ## Authors
143
+
144
+ [Haotong Lin](https://haotongl.github.io/) · [Sili Chen](https://github.com/SiliChen321) · [Junhao Liew](https://liewjunhao.github.io/) · [Donny Y. Chen](https://donydchen.github.io) · [Zhenyu Li](https://zhyever.github.io/) · [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) · [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) · [Bingyi Kang](https://bingykang.github.io/)
config.json ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "da3nested-giant-large",
3
+ "config": {
4
+ "__object__": {
5
+ "path": "depth_anything_3.model.da3",
6
+ "name": "NestedDepthAnything3Net",
7
+ "args": "as_params"
8
+ },
9
+ "anyview": {
10
+ "__object__": {
11
+ "path": "depth_anything_3.model.da3",
12
+ "name": "DepthAnything3Net",
13
+ "args": "as_params"
14
+ },
15
+ "net": {
16
+ "__object__": {
17
+ "path": "depth_anything_3.model.dinov2.dinov2",
18
+ "name": "DinoV2",
19
+ "args": "as_params"
20
+ },
21
+ "name": "vitg",
22
+ "out_layers": [
23
+ 19,
24
+ 27,
25
+ 33,
26
+ 39
27
+ ],
28
+ "alt_start": 13,
29
+ "qknorm_start": 13,
30
+ "rope_start": 13,
31
+ "cat_token": true
32
+ },
33
+ "head": {
34
+ "__object__": {
35
+ "path": "depth_anything_3.model.dualdpt",
36
+ "name": "DualDPT",
37
+ "args": "as_params"
38
+ },
39
+ "dim_in": 3072,
40
+ "output_dim": 2,
41
+ "features": 256,
42
+ "out_channels": [
43
+ 256,
44
+ 512,
45
+ 1024,
46
+ 1024
47
+ ]
48
+ },
49
+ "cam_enc": {
50
+ "__object__": {
51
+ "path": "depth_anything_3.model.cam_enc",
52
+ "name": "CameraEnc",
53
+ "args": "as_params"
54
+ },
55
+ "dim_out": 1536
56
+ },
57
+ "cam_dec": {
58
+ "__object__": {
59
+ "path": "depth_anything_3.model.cam_dec",
60
+ "name": "CameraDec",
61
+ "args": "as_params"
62
+ },
63
+ "dim_in": 3072
64
+ },
65
+ "gs_head": {
66
+ "__object__": {
67
+ "path": "depth_anything_3.model.gsdpt",
68
+ "name": "GSDPT",
69
+ "args": "as_params"
70
+ },
71
+ "dim_in": 3072,
72
+ "output_dim": 38,
73
+ "features": 256,
74
+ "out_channels": [
75
+ 256,
76
+ 512,
77
+ 1024,
78
+ 1024
79
+ ]
80
+ },
81
+ "gs_adapter": {
82
+ "__object__": {
83
+ "path": "depth_anything_3.model.gs_adapter",
84
+ "name": "GaussianAdapter",
85
+ "args": "as_params"
86
+ },
87
+ "sh_degree": 2,
88
+ "pred_color": false,
89
+ "pred_offset_depth": true,
90
+ "pred_offset_xy": true,
91
+ "gaussian_scale_min": 1e-05,
92
+ "gaussian_scale_max": 30.0
93
+ }
94
+ },
95
+ "metric": {
96
+ "__object__": {
97
+ "path": "depth_anything_3.model.da3",
98
+ "name": "DepthAnything3Net",
99
+ "args": "as_params"
100
+ },
101
+ "net": {
102
+ "__object__": {
103
+ "path": "depth_anything_3.model.dinov2.dinov2",
104
+ "name": "DinoV2",
105
+ "args": "as_params"
106
+ },
107
+ "name": "vitl",
108
+ "out_layers": [
109
+ 4,
110
+ 11,
111
+ 17,
112
+ 23
113
+ ],
114
+ "alt_start": -1,
115
+ "qknorm_start": -1,
116
+ "rope_start": -1,
117
+ "cat_token": false
118
+ },
119
+ "head": {
120
+ "__object__": {
121
+ "path": "depth_anything_3.model.dpt",
122
+ "name": "DPT",
123
+ "args": "as_params"
124
+ },
125
+ "dim_in": 1024,
126
+ "output_dim": 1,
127
+ "features": 256,
128
+ "out_channels": [
129
+ 256,
130
+ 512,
131
+ 1024,
132
+ 1024
133
+ ]
134
+ }
135
+ }
136
+ }
137
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8899faf998dedbc230261ab736fa57015280727399429122d44d4f9e7aac2ddd
3
+ size 6759558100