zach commited on
Commit Β·
321c20c
1
Parent(s): aa177c6
update metadata
Browse files
README.md
CHANGED
|
@@ -1,17 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# π Metric3D Project π
|
| 2 |
|
| 3 |
-
**Official
|
| 4 |
|
| 5 |
[1] [Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image](https://arxiv.org/abs/2307.10984)
|
| 6 |
|
| 7 |
[2] Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
|
| 8 |
|
| 9 |
-
<a href='https://jugghm.github.io/Metric3Dv2'><img src='https://img.shields.io/badge/project%20page-@Metric3D-yellow.svg'></a>
|
| 10 |
-
<a href='https://arxiv.org/abs/2307.10984'><img src='https://img.shields.io/badge/arxiv-@Metric3Dv1-green'></a>
|
| 11 |
-
<a href='https:'><img src='https://img.shields.io/badge/arxiv (on hold)-@Metric3Dv2-red'></a>
|
| 12 |
-
<a href='https://huggingface.co/spaces/JUGGHM/Metric3D'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
|
| 13 |
-
|
| 14 |
-
[//]: # (### [Project Page](https://arxiv.org/abs/2307.08695) | [v2 Paper](https://arxiv.org/abs/2307.10984) | [v1 Arxiv](https://arxiv.org/abs/2307.10984) | [Video](https://www.youtube.com/playlist?list=PLEuyXJsWqUNd04nwfm9gFBw5FVbcaQPl3) | [Hugging Face π€](https://huggingface.co/spaces/JUGGHM/Metric3D) )
|
| 15 |
|
| 16 |
## News and TO DO LIST
|
| 17 |
|
|
@@ -20,12 +26,12 @@
|
|
| 20 |
- [ ] Focal length free mode
|
| 21 |
- [ ] Floating noise removing mode
|
| 22 |
- [ ] Improving HuggingFace Demo and Visualization
|
| 23 |
-
|
| 24 |
-
|
| 25 |
- `[2024/3/18]` HuggingFace GPU version updated!
|
| 26 |
- `[2024/3/18]` [Project page](https://jugghm.github.io/Metric3Dv2/) released!
|
| 27 |
- `[2024/3/18]` Metric3D V2 models released, supporting metric depth and surface normal now!
|
| 28 |
-
- `[2023/8/10]` Inference codes,
|
| 29 |
- `[2023/7]` Metric3D accepted by ICCV 2023!
|
| 30 |
- `[2023/4]` The Champion of [2nd Monocular Depth Estimation Challenge](https://jspenmar.github.io/MDEC) in CVPR 2023
|
| 31 |
|
|
@@ -40,29 +46,6 @@ Metric3D is a versatile geometric foundation model for high-quality and zero-sho
|
|
| 40 |
|
| 41 |
### Metric Depth
|
| 42 |
|
| 43 |
-
[//]: # (#### Zero-shot Testing)
|
| 44 |
-
|
| 45 |
-
[//]: # (Our models work well on both indoor and outdoor scenarios, compared with other zero-shot metric depth estimation methods.)
|
| 46 |
-
|
| 47 |
-
[//]: # ()
|
| 48 |
-
[//]: # (| | Backbone | KITTI $\delta 1$ β | KITTI $\delta 2$ β | KITTI $\delta 3$ β | KITTI AbsRel β | KITTI RMSE β | KITTI RMS_log β | NYU $\delta 1$ β | NYU $\delta 2$ β | NYU $\delta 3$ β | NYU AbsRel β | NYU RMSE β | NYU log10 β |)
|
| 49 |
-
|
| 50 |
-
[//]: # (|-----------------|------------|--------------------|---------------------|--------------------|-----------------|---------------|------------------|------------------|------------------|------------------|---------------|-------------|--------------|)
|
| 51 |
-
|
| 52 |
-
[//]: # (| ZeroDepth | ResNet-18 | 0.910 | 0.980 | 0.996 | 0.057 | 4.044 | 0.083 | 0.901 | 0.961 | - | 0.100 | 0.380 | - |)
|
| 53 |
-
|
| 54 |
-
[//]: # (| PolyMax | ConvNeXt-L | - | - | - | - | - | - | 0.969 | 0.996 | 0.999 | 0.067 | 0.250 | 0.033 |)
|
| 55 |
-
|
| 56 |
-
[//]: # (| Ours | ViT-L | 0.985 | 0.995 | 0.999 | 0.052 | 2.511 | 0.074 | 0.975 | 0.994 | 0.998 | 0.063 | 0.251 | 0.028 |)
|
| 57 |
-
|
| 58 |
-
[//]: # (| Ours | ViT-g2 | 0.989 | 0.996 | 0.999 | 0.051 | 2.403 | 0.080 | 0.980 | 0.997 | 0.999 | 0.067 | 0.260 | 0.030 |)
|
| 59 |
-
|
| 60 |
-
[//]: # ()
|
| 61 |
-
[//]: # ([//]: # (| Adabins | Efficient-B5 | 0.964 | 0.995 | 0.999 | 0.058 | 2.360 | 0.088 | 0.903 | 0.984 | 0.997 | 0.103 | 0.0444 | 0.364 |))
|
| 62 |
-
[//]: # ([//]: # (| NewCRFs | SwinT-L | 0.974 | 0.997 | 0.999 | 0.052 | 2.129 | 0.079 | 0.922 | 0.983 | 0.994 | 0.095 | 0.041 | 0.334 |))
|
| 63 |
-
[//]: # ([//]: # (| Ours (CSTM_label) | ConvNeXt-L | 0.964 | 0.993 | 0.998 | 0.058 | 2.770 | 0.092 | 0.944 | 0.986 | 0.995 | 0.083 | 0.035 | 0.310 |))
|
| 64 |
-
|
| 65 |
-
[//]: # (#### Finetuned)
|
| 66 |
Our models rank 1st on the routing KITTI and NYU benchmarks.
|
| 67 |
|
| 68 |
| | Backbone | KITTI Ξ΄1 β | KITTI Ξ΄2 β | KITTI AbsRel β | KITTI RMSE β | KITTI RMS_log β | NYU Ξ΄1 β | NYU Ξ΄2 β | NYU AbsRel β | NYU RMSE β | NYU log10 β |
|
|
@@ -111,103 +94,6 @@ Our models also show powerful performance on normal benchmarks.
|
|
| 111 |
### Improving monocular SLAM
|
| 112 |
<img src="media/gifs/demo_22.gif" width="600" height="337">
|
| 113 |
|
| 114 |
-
[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/f95815ef-2506-4193-a6d9-1163ea821268)
|
| 115 |
-
|
| 116 |
-
[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/ed00706c-41cc-49ea-accb-ad0532633cc2)
|
| 117 |
-
|
| 118 |
-
[//]: # (### Zero-shot metric 3D recovery)
|
| 119 |
-
|
| 120 |
-
[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/26cd7ae1-dd5a-4446-b275-54c5ca7ef945)
|
| 121 |
-
|
| 122 |
-
[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/21e5484b-c304-4fe3-b1d3-8eebc4e26e42)
|
| 123 |
-
[//]: # (### Monocular reconstruction for a Sequence)
|
| 124 |
-
|
| 125 |
-
[//]: # ()
|
| 126 |
-
[//]: # (### In-the-wild 3D reconstruction)
|
| 127 |
-
|
| 128 |
-
[//]: # ()
|
| 129 |
-
[//]: # (| | Image | Reconstruction | Pointcloud File |)
|
| 130 |
-
|
| 131 |
-
[//]: # (|:---------:|:------------------:|:------------------:|:--------:|)
|
| 132 |
-
|
| 133 |
-
[//]: # (| room | <img src="data/wild_demo/jonathan-borba-CnthDZXCdoY-unsplash.jpg" width="300" height="335"> | <img src="media/gifs/room.gif" width="300" height="335"> | [Download](https://drive.google.com/file/d/1P1izSegH2c4LUrXGiUksw037PVb0hjZr/view?usp=drive_link) |)
|
| 134 |
-
|
| 135 |
-
[//]: # (| Colosseum | <img src="data/wild_demo/david-kohler-VFRTXGw1VjU-unsplash.jpg" width="300" height="169"> | <img src="media/gifs/colo.gif" width="300" height="169"> | [Download](https://drive.google.com/file/d/1jJCXe5IpxBhHDr0TZtNZhjxKTRUz56Hg/view?usp=drive_link) |)
|
| 136 |
-
|
| 137 |
-
[//]: # (| chess | <img src="data/wild_demo/randy-fath-G1yhU1Ej-9A-unsplash.jpg" width="300" height="169" align=center> | <img src="media/gifs/chess.gif" width="300" height="169"> | [Download](https://drive.google.com/file/d/1oV_Foq25_p-tTDRTcyO2AzXEdFJQz-Wm/view?usp=drive_link) |)
|
| 138 |
-
|
| 139 |
-
[//]: # ()
|
| 140 |
-
[//]: # (All three images are downloaded from [unplash](https://unsplash.com/) and put in the data/wild_demo directory.)
|
| 141 |
-
|
| 142 |
-
[//]: # ()
|
| 143 |
-
[//]: # (### 3D metric reconstruction, Metric3D Γ DroidSLAM)
|
| 144 |
-
|
| 145 |
-
[//]: # (Metric3D can also provide scale information for DroidSLAM, help to solve the scale drift problem for better trajectories. )
|
| 146 |
-
|
| 147 |
-
[//]: # ()
|
| 148 |
-
[//]: # (#### Bird Eyes' View (Left: Droid-SLAM (mono). Right: Droid-SLAM with Metric-3D))
|
| 149 |
-
|
| 150 |
-
[//]: # ()
|
| 151 |
-
[//]: # (<div align=center>)
|
| 152 |
-
|
| 153 |
-
[//]: # (<img src="media/gifs/0028.gif"> )
|
| 154 |
-
|
| 155 |
-
[//]: # (</div>)
|
| 156 |
-
|
| 157 |
-
[//]: # ()
|
| 158 |
-
[//]: # (### Front View)
|
| 159 |
-
|
| 160 |
-
[//]: # ()
|
| 161 |
-
[//]: # (<div align=center>)
|
| 162 |
-
|
| 163 |
-
[//]: # (<img src="media/gifs/0028_fv.gif"> )
|
| 164 |
-
|
| 165 |
-
[//]: # (</div>)
|
| 166 |
-
|
| 167 |
-
[//]: # ()
|
| 168 |
-
[//]: # (#### KITTI odemetry evaluation (Translational RMS drift (t_rel, β) / Rotational RMS drift (r_rel, β)))
|
| 169 |
-
|
| 170 |
-
[//]: # (| | Modality | seq 00 | seq 02 | seq 05 | seq 06 | seq 08 | seq 09 | seq 10 |)
|
| 171 |
-
|
| 172 |
-
[//]: # (|:----------:|:--------:|:----------:|:----------:|:---------:|:----------:|:----------:|:---------:|:---------:|)
|
| 173 |
-
|
| 174 |
-
[//]: # (| ORB-SLAM2 | Mono | 11.43/0.58 | 10.34/0.26 | 9.04/0.26 | 14.56/0.26 | 11.46/0.28 | 9.3/0.26 | 2.57/0.32 |)
|
| 175 |
-
|
| 176 |
-
[//]: # (| Droid-SLAM | Mono | 33.9/0.29 | 34.88/0.27 | 23.4/0.27 | 17.2/0.26 | 39.6/0.31 | 21.7/0.23 | 7/0.25 |)
|
| 177 |
-
|
| 178 |
-
[//]: # (| Droid+Ours | Mono | 1.44/0.37 | 2.64/0.29 | 1.44/0.25 | 0.6/0.2 | 2.2/0.3 | 1.63/0.22 | 2.73/0.23 |)
|
| 179 |
-
|
| 180 |
-
[//]: # (| ORB-SLAM2 | Stereo | 0.88/0.31 | 0.77/0.28 | 0.62/0.26 | 0.89/0.27 | 1.03/0.31 | 0.86/0.25 | 0.62/0.29 |)
|
| 181 |
-
|
| 182 |
-
[//]: # ()
|
| 183 |
-
[//]: # (Metric3D makes the mono-SLAM scale-aware, like stereo systems.)
|
| 184 |
-
|
| 185 |
-
[//]: # ()
|
| 186 |
-
[//]: # (#### KITTI sequence videos - Youtube)
|
| 187 |
-
|
| 188 |
-
[//]: # ([2011_09_30_drive_0028](https://youtu.be/gcTB4MgVCLQ) /)
|
| 189 |
-
|
| 190 |
-
[//]: # ([2011_09_30_drive_0033](https://youtu.be/He581fmoPP4) /)
|
| 191 |
-
|
| 192 |
-
[//]: # ([2011_09_30_drive_0034](https://youtu.be/I3PkukQ3_F8))
|
| 193 |
-
|
| 194 |
-
[//]: # ()
|
| 195 |
-
[//]: # (#### Estimated pose)
|
| 196 |
-
|
| 197 |
-
[//]: # ([2011_09_30_drive_0033](https://drive.google.com/file/d/1SMXWzLYrEdmBe6uYMR9ShtDXeFDewChv/view?usp=drive_link) / )
|
| 198 |
-
|
| 199 |
-
[//]: # ([2011_09_30_drive_0034](https://drive.google.com/file/d/1ONU4GxpvTlgW0TjReF1R2i-WFxbbjQPG/view?usp=drive_link) /)
|
| 200 |
-
|
| 201 |
-
[//]: # ([2011_10_03_drive_0042](https://drive.google.com/file/d/19fweg6p1Q6TjJD2KlD7EMA_aV4FIeQUD/view?usp=drive_link))
|
| 202 |
-
|
| 203 |
-
[//]: # ()
|
| 204 |
-
[//]: # (#### Pointcloud files)
|
| 205 |
-
|
| 206 |
-
[//]: # ([2011_09_30_drive_0033](https://drive.google.com/file/d/1K0o8DpUmLf-f_rue0OX1VaHlldpHBAfw/view?usp=drive_link) /)
|
| 207 |
-
|
| 208 |
-
[//]: # ([2011_09_30_drive_0034](https://drive.google.com/file/d/1bvZ6JwMRyvi07H7Z2VD_0NX1Im8qraZo/view?usp=drive_link) /)
|
| 209 |
-
|
| 210 |
-
[//]: # ([2011_10_03_drive_0042](https://drive.google.com/file/d/1Vw59F8nN5ApWdLeGKXvYgyS9SNKHKy4x/view?usp=drive_link))
|
| 211 |
|
| 212 |
## π¨ Installation
|
| 213 |
### One-line Installation
|
|
@@ -263,14 +149,17 @@ Inference settings are defined as
|
|
| 263 |
```
|
| 264 |
where the images will be first resized as the ```crop_size``` and then fed into the model.
|
| 265 |
|
|
|
|
|
|
|
|
|
|
| 266 |
## βοΈ Inference
|
| 267 |
### Download Checkpoint
|
| 268 |
| | Encoder | Decoder | Link |
|
| 269 |
|:----:|:-------------------:|:-----------------:|:-------------------------------------------------------------------------------------------------:|
|
| 270 |
| v1-T | ConvNeXt-Tiny | Hourglass-Decoder | Coming soon |
|
| 271 |
-
| v1-L | ConvNeXt-Large | Hourglass-Decoder | [Download](
|
| 272 |
-
| v2-S | DINO2reg-ViT-Small | RAFT-4iter | [Download](
|
| 273 |
-
| v2-L | DINO2reg-ViT-Large | RAFT-8iter | [Download](
|
| 274 |
| v2-g | DINO2reg-ViT-giant2 | RAFT-8iter | Coming soon |
|
| 275 |
|
| 276 |
### Dataset Mode
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: bsd-2-clause
|
| 3 |
+
pipeline_tag: depth-estimation
|
| 4 |
+
tags:
|
| 5 |
+
- Metric Depth
|
| 6 |
+
- Surface Normal
|
| 7 |
+
---
|
| 8 |
# π Metric3D Project π
|
| 9 |
|
| 10 |
+
**Official Model card of Metric3Dv1 and Metric3Dv2:**
|
| 11 |
|
| 12 |
[1] [Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image](https://arxiv.org/abs/2307.10984)
|
| 13 |
|
| 14 |
[2] Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
|
| 15 |
|
| 16 |
+
<a href='https://jugghm.github.io/Metric3Dv2' style='display: inline-block;'><img src='https://img.shields.io/badge/project%20page-@Metric3D-yellow.svg'></a>
|
| 17 |
+
<a href='https://arxiv.org/abs/2307.10984' style='display: inline-block;'><img src='https://img.shields.io/badge/arxiv-@Metric3Dv1-green'></a>
|
| 18 |
+
<a href='https:' style='display: inline-block;'><img src='https://img.shields.io/badge/arxiv (on hold)-@Metric3Dv2-red'></a>
|
| 19 |
+
<a href='https://huggingface.co/spaces/JUGGHM/Metric3D' style='display: inline-block;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
|
| 20 |
+
<a href='https://huggingface.co/zachL1/Metric3D' style='display: inline-block;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20card-E0FFFF'></a>
|
|
|
|
| 21 |
|
| 22 |
## News and TO DO LIST
|
| 23 |
|
|
|
|
| 26 |
- [ ] Focal length free mode
|
| 27 |
- [ ] Floating noise removing mode
|
| 28 |
- [ ] Improving HuggingFace Demo and Visualization
|
| 29 |
+
|
| 30 |
+
- `[2024/4/11]` Training codes are released!
|
| 31 |
- `[2024/3/18]` HuggingFace GPU version updated!
|
| 32 |
- `[2024/3/18]` [Project page](https://jugghm.github.io/Metric3Dv2/) released!
|
| 33 |
- `[2024/3/18]` Metric3D V2 models released, supporting metric depth and surface normal now!
|
| 34 |
+
- `[2023/8/10]` Inference codes, pre-trained weights, and demo released.
|
| 35 |
- `[2023/7]` Metric3D accepted by ICCV 2023!
|
| 36 |
- `[2023/4]` The Champion of [2nd Monocular Depth Estimation Challenge](https://jspenmar.github.io/MDEC) in CVPR 2023
|
| 37 |
|
|
|
|
| 46 |
|
| 47 |
### Metric Depth
|
| 48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
Our models rank 1st on the routing KITTI and NYU benchmarks.
|
| 50 |
|
| 51 |
| | Backbone | KITTI Ξ΄1 β | KITTI Ξ΄2 β | KITTI AbsRel β | KITTI RMSE β | KITTI RMS_log β | NYU Ξ΄1 β | NYU Ξ΄2 β | NYU AbsRel β | NYU RMSE β | NYU log10 β |
|
|
|
|
| 94 |
### Improving monocular SLAM
|
| 95 |
<img src="media/gifs/demo_22.gif" width="600" height="337">
|
| 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
|
| 98 |
## π¨ Installation
|
| 99 |
### One-line Installation
|
|
|
|
| 149 |
```
|
| 150 |
where the images will be first resized as the ```crop_size``` and then fed into the model.
|
| 151 |
|
| 152 |
+
## βοΈ Training
|
| 153 |
+
Please refer to [training/README.md](training/README.md)
|
| 154 |
+
|
| 155 |
## βοΈ Inference
|
| 156 |
### Download Checkpoint
|
| 157 |
| | Encoder | Decoder | Link |
|
| 158 |
|:----:|:-------------------:|:-----------------:|:-------------------------------------------------------------------------------------------------:|
|
| 159 |
| v1-T | ConvNeXt-Tiny | Hourglass-Decoder | Coming soon |
|
| 160 |
+
| v1-L | ConvNeXt-Large | Hourglass-Decoder | [Download](weight/convlarge_hourglass_0.3_150_step750k_v1.1.pth) |
|
| 161 |
+
| v2-S | DINO2reg-ViT-Small | RAFT-4iter | [Download](weight/metric_depth_vit_small_800k.pth) |
|
| 162 |
+
| v2-L | DINO2reg-ViT-Large | RAFT-8iter | [Download](weight/metric_depth_vit_large_800k.pth) |
|
| 163 |
| v2-g | DINO2reg-ViT-giant2 | RAFT-8iter | Coming soon |
|
| 164 |
|
| 165 |
### Dataset Mode
|