| + | ImageNet-1k | +NYU-Depth v2 | +SUN-RGBD | +ADE20k | +iNaturalist 2018 | +Oxford-H | +|||
|---|---|---|---|---|---|---|---|---|---|
| model | +with registers |
+ classif. (acc) | +classif. (acc) | +classif. V2 (acc) | +depth (RMSE) | +depth (RMSE) | +segm. (mAP) | +classif. (acc) | +retrieval (mAP) | +
| k-NN | +linear | +linear | +linear 4 layers |
+ NYU-D transfer | +multiscale | +linear | +nearest neighbor | +||
| ViT-S/14 | +:x: | +79.0% | +81.1% | +70.8% | +0.417 | +0.431 | +47.2 | +69.5% | +43.2 | +
| ViT-S/14 | +:white_check_mark: | +79.1% | +80.9% | +71.0% | +N/A | +N/A | +N/A | +67.6% | +39.5 | +
| ViT-B/14 | +:x: | +82.1% | +84.5% | +74.9% | +0.362 | +0.400 | +51.3 | +76.3% | +49.5 | +ViT-B/14 | +:white_check_mark: | +82.0% | +84.6% | +75.6% | +N/A | +N/A | +N/A | +73.8% | +51.0 | + +
| ViT-L/14 | +:x: | +83.5% | +86.3% | +77.6% | +0.333 | +0.396 | +53.1 | +79.8% | +54.0 | +
| ViT-L/14 | +:white_check_mark: | +83.8% | +86.7% | +78.5% | +N/A | +N/A | +N/A | +80.9% | +55.7 | +
| ViT-g/14 | +:x: | +83.5% | +86.5% | +78.4% | +0.298 | +0.362 | +53.0 | +81.6% | +52.3 | +
| ViT-g/14 | +:white_check_mark: | +83.7% | +87.1% | +78.8% | +N/A | +N/A | +N/A | +81.5% | +58.2 | +
| model | +# of params |
+ with registers |
+ ImageNet k-NN |
+ ImageNet linear |
+ download | +
|---|---|---|---|---|---|
| ViT-S/14 distilled | +21 M | +:x: | +79.0% | +81.1% | +backbone only | +
| ViT-S/14 distilled | +21 M | +:white_check_mark: | +79.1% | +80.9% | +backbone only | +
| ViT-B/14 distilled | +86 M | +:x: | +82.1% | +84.5% | +backbone only | +
| ViT-B/14 distilled | +86 M | +:white_check_mark: | +82.0% | +84.6% | +backbone only | +
| ViT-L/14 distilled | +300 M | +:x: | +83.5% | +86.3% | +backbone only | +
| ViT-L/14 distilled | +300 M | +:white_check_mark: | +83.8% | +86.7% | +backbone only | +
| ViT-g/14 | +1,100 M | +:x: | +83.5% | +86.5% | +backbone only | +
| ViT-g/14 | +1,100 M | +:white_check_mark: | +83.7% | +87.1% | +backbone only | +
| backbone | +with registers |
+ download | +
|---|---|---|
| ImageNet | +||
| ViT-S/14 distilled | +:x: | ++ linear head (1 layer, + 4 layers) + | +
| ViT-S/14 distilled | +:white_check_mark: | ++ linear head (1 layer, + 4 layers) + | +
| ViT-B/14 distilled | +:x: | ++ linear head (1 layer, + 4 layers) + |
| ViT-B/14 distilled | +:white_check_mark: | ++ linear head (1 layer, + 4 layers) + |
| ViT-L/14 distilled | +:x: | ++ linear head (1 layer, + 4 layers) + |
| ViT-L/14 distilled | +:white_check_mark: | ++ linear head (1 layer, + 4 layers) + |
| ViT-g/14 | +:x: | ++ linear head (1 layer, + 4 layers) + |
| ViT-g/14 | +:white_check_mark: | ++ linear head (1 layer, + 4 layers) + |
| backbone | +download head | +|
|---|---|---|
| NYUd | +KITTI | +|
| ViT-S/14 distilled | ++ linear (1 layer, + 4 layers), + DPT + | ++ linear (1 layer, + 4 layers), + DPT + | +
| ViT-B/14 distilled | ++ linear (1 layer, + 4 layers), + DPT + | ++ linear (1 layer, + 4 layers), + DPT + | +
| ViT-L/14 distilled | ++ linear (1 layer, + 4 layers), + DPT + | ++ linear (1 layer, + 4 layers), + DPT + | +
| ViT-g/14 | ++ linear (1 layer, + 4 layers), + DPT + | ++ linear (1 layer, + 4 layers), + DPT + | +
| backbone | +download model | +download head | +|
|---|---|---|---|
| ADE20K | +ADE20K | +VOC2012 | +|
| ViT-S/14 distilled | ++ | + linear, + multi-scale + | ++ linear, + multi-scale + | +
| ViT-B/14 distilled | ++ | + linear, + multi-scale + | ++ linear, + multi-scale + | +
| ViT-L/14 distilled | ++ | + linear, + multi-scale + | ++ linear, + multi-scale + | +
| ViT-g/14 | ++ Mask2Former + | ++ linear, + multi-scale + | ++ linear, + multi-scale + | +
| model | +with registers |
+ ImageNet top-1 |
+ linear evaluation | +
|---|---|---|---|
| ViT-S/14 distilled | +:x: | +81.1% | +linear head weights | +
| ViT-S/14 distilled | +:white_check_mark: | +80.8% | +linear head weights | +
| ViT-B/14 distilled | +:x: | +84.5% | +linear head weights | +
| ViT-B/14 distilled | +:white_check_mark: | +84.4% | +linear head weights | +
| ViT-L/14 distilled | +:x: | +86.3% | +linear head weights | +
| ViT-L/14 distilled | +:white_check_mark: | +86.5% | +linear head weights | +
| ViT-g/14 | +:x: | +86.5% | +linear head weights | +
| ViT-g/14 | +:white_check_mark: | +87.0% | +linear head weights | +