# Download Pretrained Backbone Weights Here we collect the **links** of the backbone models which makes it easier for users to **download pretrained weights** for the **builtin backbones**. And this document will be kept updated. Most included models are borrowed from their original sources. Many thanks for their nicely work in the backbone area. ## ResNet We've already provided the tutorials of **using torchvision pretrained ResNet models** here: [Download TorchVision ResNet Models](https://detrex.readthedocs.io/en/latest/tutorials/Converters.html#download-pretrained-weights). ## Swin-Transformer Here we borrowed the download links from the [official implementation](https://github.com/microsoft/Swin-Transformer#main-results-on-imagenet-with-pretrained-models) of Swin-Transformer. ### Swin-Tiny
Name Pretrain Resolution Acc@1 Acc@5 22K Model 1K Model
Swin-Tiny ImageNet-1K 224x224 81.2 95.5 - download
Swin-Tiny ImageNet-22K 224x224 80.9 96.0 download download
Using Swin-Tiny Backbone in Config ```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=224, embed_dim=96, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), drop_path_rate=0.1, window_size=7, out_indices=(1, 2, 3), ) # setup init checkpoint path # train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224.pth" train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224_22kto1k_finetune.pth" ```
### Swin-Small
Name Pretrain Resolution Acc@1 Acc@5 22K Model 1K Model
Swin-Small ImageNet-1K 224x224 83.2 96.2 - download
Swin-Small ImageNet-22K 224x224 83.2 97.0 download download
Using Swin-Small Backbone in Config ```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=224, embed_dim=96, depths=(2, 2, 18, 2), num_heads=(3, 6, 12, 24), drop_path_rate=0.2, window_size=7, out_indices=(1, 2, 3), ) # setup init checkpoint path # train.init_checkpoint = "/path/to/swin_small_patch4_window7_224.pth" train.init_checkpoint = "/path/to/swin_small_patch4_window7_224_22kto1k_finetune.pth" ```
### Swin-Base
Name Pretrain Resolution Acc@1 Acc@5 22K Model 1K Model
Swin-Base ImageNet-1K 224x224 83.5 96.5 - download
Swin-Base ImageNet-1K 384x384 84.5 97.0 - download
Swin-Base ImageNet-22K 224x224 85.2 97.5 download download
Swin-Base ImageNet-22K 384x384 86.4 98.0 download download
Using Swin-Base-224 Backbone in Config ```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=224, embed_dim=128, depths=(2, 2, 18, 2), num_heads=(4, 8, 16, 32), window_size=7, out_indices=(1, 2, 3), ) # setup init checkpoint path # train.init_checkpoint = "/path/to/swin_base_patch4_window7_224.pth" train.init_checkpoint = "/path/to/swin_base_patch4_window7_224_22kto1k.pth" ```
Using Swin-Base-384 Backbone in Config ```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=384, embed_dim=128, depths=(2, 2, 18, 2), num_heads=(4, 8, 16, 32), window_size=12, out_indices=(1, 2, 3), ) # setup init checkpoint path # train.init_checkpoint = "/path/to/swin_base_patch4_window12_384.pth" train.init_checkpoint = "/path/to/swin_base_patch4_window12_384_22kto1k.pth" ```
### Swin-Large
Name Pretrain Resolution Acc@1 Acc@5 22K Model 1K Model
Swin-Large ImageNet-22K 224x224 86.3 97.9 download download
Swin-Large ImageNet-22K 384x384 87.3 98.2 download download
Using Swin-Large-224 Backbone in Config ```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=224, embed_dim=192, depths=(2, 2, 18, 2), num_heads=(6, 12, 24, 48), window_size=7, out_indices=(1, 2, 3), ) # setup init checkpoint path train.init_checkpoint = "/path/to/swin_large_patch4_window7_224_22kto1k.pth" ```
Using Swin-Large-384 Backbone in Config ```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=384, embed_dim=192, depths=(2, 2, 18, 2), num_heads=(6, 12, 24, 48), window_size=12, out_indices=(1, 2, 3), ) # setup init checkpoint path train.init_checkpoint = "/path/to/swin_large_patch4_window12_384_22kto1k.pth" ```
## ViTDet Here we borrowed the download links from the [official implementation](https://github.com/facebookresearch/mae#fine-tuning-with-pre-trained-checkpoints) of MAE.
ViT-Base ViT-Large ViT-Huge
Pretrained Checkpoint download download download
Using ViTDet Backbone in Config ```python import torch.nn as nn from detectron2.config import LazyCall as L from detectron2.layers import ShapeSpec from detectron2.modeling import ViT, SimpleFeaturePyramid from detectron2.modeling.backbone.fpn import LastLevelMaxPool from .dino_r50 import model # ViT Base Hyper-params embed_dim, depth, num_heads, dp = 768, 12, 12, 0.1 # Creates Simple Feature Pyramid from ViT backbone model.backbone = L(SimpleFeaturePyramid)( net=L(ViT)( # Single-scale ViT backbone img_size=1024, patch_size=16, embed_dim=embed_dim, depth=depth, num_heads=num_heads, drop_path_rate=dp, window_size=14, mlp_ratio=4, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), window_block_indexes=[ # 2, 5, 8 11 for global attention 0, 1, 3, 4, 6, 7, 9, 10, ], residual_block_indexes=[], use_rel_pos=True, out_feature="last_feat", ), in_feature="${.net.out_feature}", out_channels=256, scale_factors=(2.0, 1.0, 0.5), # (4.0, 2.0, 1.0, 0.5) in ViTDet top_block=L(LastLevelMaxPool)(), norm="LN", square_pad=1024, ) # setup init checkpoint path train.init_checkpoint = "/path/to/mae_pretrain_vit_base.pth" ```
Please refer to [DINO](https://github.com/IDEA-Research/detrex/tree/main/projects/dino) project for more details about the usage of vit backbone. ## FocalNet Here we borrowed the download links from the [official implementation](https://github.com/microsoft/FocalNet#imagenet-22k-pretrained) of FocalNet.
Model Depth Dim Kernels #Params. (M) Download
FocalNet-L [2, 2, 18, 2] 192 [5, 7, 9] 207 download
FocalNet-L [2, 2, 18, 2] 192 [3, 5, 7, 9] 207 download
FocalNet-XL [2, 2, 18, 2] 256 [5, 7, 9] 366 download
FocalNet-XL [2, 2, 18, 2] 256 [3, 5, 7, 9] 207 download
FocalNet-H [2, 2, 18, 2] 352 [5, 7, 9] 687 download
FocalNet-H [2, 2, 18, 2] 352 [3, 5, 7, 9] 687 download
Using FocalNet Backbone in Config ```python # focalnet-large-4scale baseline model.backbone = L(FocalNet)( embed_dim=192, depths=(2, 2, 18, 2), focal_levels=(3, 3, 3, 3), focal_windows=(5, 5, 5, 5), use_conv_embed=True, use_postln=True, use_postln_in_modulation=False, use_layerscale=True, normalize_modulator=False, out_indices=(1, 2, 3), ) ```