diff --git a/.gitattributes b/.gitattributes index 522ec0142a4e290c2ecfa48cb3033fa78a5a9e78..f2ccc89f9c69153ae54907ca61f612d10a6c94be 100644 --- a/.gitattributes +++ b/.gitattributes @@ -36,3 +36,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text DAP-main-2/assets/depth_teaser2_00.png filter=lfs diff=lfs merge=lfs -text DAP-main-2/assets/depth_teaser2.pdf filter=lfs diff=lfs merge=lfs -text DAP-main-2/assets/teaser.jpg filter=lfs diff=lfs merge=lfs -text +assets/depth_teaser2_00.png filter=lfs diff=lfs merge=lfs -text +assets/depth_teaser2.pdf filter=lfs diff=lfs merge=lfs -text +assets/teaser.jpg filter=lfs diff=lfs merge=lfs -text diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000000000000000000000000000000000000..330c80c4dc8a58fb0ddb3a07b60d7c2ec43d7c9f --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 Insta360 Research Team + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..967c085833f033e81b926a962e50db8e5fd7e17d --- /dev/null +++ b/README.md @@ -0,0 +1,103 @@ +
+ Xin Lin ·
+ Meixi Song ·
+ Dizhe Zhang ·
+ Wenxuan Lu ·
+ Haodong Li
+
+ Bo Du ·
+ Ming-Hsuan Yang ·
+ Truong Nguyen ·
+ Lu Qi
+
| + + | Global Tasks | +Dense Tasks | +|||||||
|---|---|---|---|---|---|---|---|---|---|
| Model | + +IN-ReaL | +IN-R | +Obj.Net | +Ox.-H | +ADE20k | +NYU↓ | +DAVIS | +NAVI | +SPair | +
| DINOv3 ViT-S/16 | + +87.0 | +60.4 | +50.9 | +49.5 | +47.0 | +0.403 | +72.7 | +56.3 | +50.4 | +
| DINOv3 ViT-S+/16 | + +88.0 | +68.8 | +54.6 | +50.0 | +48.8 | +0.399 | +75.5 | +57.1 | +55.2 | +
| DINOv3 ViT-B/16 | + +89.3 | +76.7 | +64.1 | +58.5 | +51.8 | +0.373 | +77.2 | +58.8 | +57.2 | +
| DINOv3 ViT-L/16 | + +90.2 | +88.1 | +74.8 | +63.1 | +54.9 | +0.352 | +79.9 | +62.3 | +61.3 | +
| DINOv3 ViT-H+/16 | + +90.3 | +90.0 | +78.6 | +64.5 | +54.8 | +0.352 | +79.3 | +63.3 | +56.3 | +
| DINOv3 ViT-7B/16 | + +90.4 | +91.1 | +91.1 | +72.8 | +55.9 | +0.309 | +79.7 | +64.4 | +58.7 | +
| + | Global Tasks | +Dense Tasks | +||||||
|---|---|---|---|---|---|---|---|---|
| Model | +IN-ReaL | +IN-R | +Obj.Net | +ADE20k | +NYU↓ | +|||
| + | @256px | +@512px | +@256px | +@512px | +@256px | +@512px | ++ | |
| DINOv3 ConvNeXt Tiny | +86.6 | +87.7 | +73.7 | +74.1 | +52.6 | +58.7 | +42.7 | +0.448 | +
| DINOv3 ConvNeXt Small | +87.9 | +88.7 | +73.7 | +74.1 | +52.6 | +58.7 | +44.8 | +0.432 | +
| DINOv3 ConvNeXt Base | +88.5 | +89.2 | +77.2 | +78.2 | +56.2 | +61.3 | +46.3 | +0.420 | +
| DINOv3 ConvNeXt Large | +88.9 | +89.4 | +81.3 | +82.4 | +59.3 | +65.2 | +47.8 | +0.403 | +
| + | (GEO-Bench) Classification | +||||||
|---|---|---|---|---|---|---|---|
| Model + | m-BEnet | +m-brick-kiln + | m-eurosat | +m-forestnet | +m-pv4ger | +m-so2sat | +mean | +
| DINOv3 ViT-L/16 | +73.0 | +96.5 | +94.1 | +60.6 | +96.0 | +57.4 | +79.6 | +
| DINOv3 ViT-7B/16 | +74.0 | +97.2 | +94.8 | +62.3 | +96.1 | +62.1 | +81.1 | +
| + | (GEO-Bench) Segmentation | +||||||
| Model | +m-cashew | +m-chesapeake | +m-NeonTree | +m-nz-cattle | +m-pv4ger-seg | +m-SA-crop | +mean | +
| DINOv3 ViT-L/16 | +94.2 | +75.6 | +61.8 | +83.7 | +95.2 | +36.8 | +74.5 | +
| DINOv3 ViT-7B/16 | +94.1 | +76.6 | +62.6 | +83.4 | +95.5 | +37.6 | +75.0 | +
| Model | +Parameters | +Pretraining Dataset |
+ Download | +
|---|---|---|---|
| ViT-S/16 distilled | +21M | +LVD-1689M | +[link] | +
| ViT-S+/16 distilled | +29M | +LVD-1689M | +[link] | +
| ViT-B/16 distilled | +86M | +LVD-1689M | +[link] | +
| ViT-L/16 distilled | +300M | +LVD-1689M | +[link] | +
| ViT-H+/16 distilled | +840M | +LVD-1689M | +[link] | +
| ViT-7B/16 | +6,716M | +LVD-1689M | +[link] | +
| Model | +Parameters | +Pretraining Dataset |
+ Download | +
|---|---|---|---|
| ConvNeXt Tiny | +29M | +LVD-1689M | +[link] | +
| ConvNeXt Small | +50M | +LVD-1689M | +[link] | +
| ConvNeXt Base | +89M | +LVD-1689M | +[link] | +
| ConvNeXt Large | +198M | +LVD-1689M | +[link] | +
| Model | +Parameters | +Pretraining Dataset |
+ Download | +
|---|---|---|---|
| ViT-L/16 distilled | +300M | +SAT-493M | +[link] | +
| ViT-7B/16 | +6,716M | +SAT-493M | +[link] | +
| Backbone | +Pretraining Dataset |
+ Head Dataset |
+ Download | +
|---|---|---|---|
| ViT-7B/16 | +LVD-1689M | +ImageNet | +[link] | +
| Backbone | +Pretraining Dataset |
+ Head Dataset |
+ Download | +
|---|---|---|---|
| ViT-7B/16 | +LVD-1689M | +SYNTHMIX | +[link] | +
| Backbone | +Pretraining Dataset |
+ Head Dataset |
+ Download | +
|---|---|---|---|
| ViT-7B/16 | +LVD-1689M | +COCO2017 | +[link] | +
| Backbone | +Pretraining Dataset |
+ Head Dataset |
+ Download | +
|---|---|---|---|
| ViT-7B/16 | +LVD-1689M | +ADE20K | +[link] | +
| Backbone | +Download | +
|---|---|
| ViT-L/16 distilled | ++ [link], + vocabulary, + vocabulary license + | +