# Download Pretrained Backbone Weights Here we collect the **links** of the backbone models which makes it easier for users to **download pretrained weights** for the **builtin backbones**. And this document will be kept updated. Most included models are borrowed from their original sources. Many thanks for their nicely work in the backbone area. ## ResNet We've already provided the tutorials of **using torchvision pretrained ResNet models** here: [Download TorchVision ResNet Models](https://detrex.readthedocs.io/en/latest/tutorials/Converters.html#download-pretrained-weights). ## Swin-Transformer Here we borrowed the download links from the [official implementation](https://github.com/microsoft/Swin-Transformer#main-results-on-imagenet-with-pretrained-models) of Swin-Transformer. ### Swin-Tiny

Name	Pretrain	Resolution	Acc@1	Acc@5	22K Model	1K Model
Swin-Tiny	ImageNet-1K	224x224	81.2	95.5	-	download
Swin-Tiny	ImageNet-22K	224x224	80.9	96.0	download	download

Using Swin-Tiny Backbone in Config

```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=224, embed_dim=96, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), drop_path_rate=0.1, window_size=7, out_indices=(1, 2, 3), ) # setup init checkpoint path # train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224.pth" train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224_22kto1k_finetune.pth" ```

### Swin-Small

Name	Pretrain	Resolution	Acc@1	Acc@5	22K Model	1K Model
Swin-Small	ImageNet-1K	224x224	83.2	96.2	-	download
Swin-Small	ImageNet-22K	224x224	83.2	97.0	download	download

Using Swin-Small Backbone in Config

```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=224, embed_dim=96, depths=(2, 2, 18, 2), num_heads=(3, 6, 12, 24), drop_path_rate=0.2, window_size=7, out_indices=(1, 2, 3), ) # setup init checkpoint path # train.init_checkpoint = "/path/to/swin_small_patch4_window7_224.pth" train.init_checkpoint = "/path/to/swin_small_patch4_window7_224_22kto1k_finetune.pth" ```

### Swin-Base

Name	Pretrain	Resolution	Acc@1	Acc@5	22K Model	1K Model
Swin-Base	ImageNet-1K	224x224	83.5	96.5	-	download
Swin-Base	ImageNet-1K	384x384	84.5	97.0	-	download
Swin-Base	ImageNet-22K	224x224	85.2	97.5	download	download
Swin-Base	ImageNet-22K	384x384	86.4	98.0	download	download

Using Swin-Base-224 Backbone in Config

```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=224, embed_dim=128, depths=(2, 2, 18, 2), num_heads=(4, 8, 16, 32), window_size=7, out_indices=(1, 2, 3), ) # setup init checkpoint path # train.init_checkpoint = "/path/to/swin_base_patch4_window7_224.pth" train.init_checkpoint = "/path/to/swin_base_patch4_window7_224_22kto1k.pth" ```

Using Swin-Base-384 Backbone in Config

```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=384, embed_dim=128, depths=(2, 2, 18, 2), num_heads=(4, 8, 16, 32), window_size=12, out_indices=(1, 2, 3), ) # setup init checkpoint path # train.init_checkpoint = "/path/to/swin_base_patch4_window12_384.pth" train.init_checkpoint = "/path/to/swin_base_patch4_window12_384_22kto1k.pth" ```

### Swin-Large

Name	Pretrain	Resolution	Acc@1	Acc@5	22K Model	1K Model
Swin-Large	ImageNet-22K	224x224	86.3	97.9	download	download
Swin-Large	ImageNet-22K	384x384	87.3	98.2	download	download

Using Swin-Large-224 Backbone in Config

```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=224, embed_dim=192, depths=(2, 2, 18, 2), num_heads=(6, 12, 24, 48), window_size=7, out_indices=(1, 2, 3), ) # setup init checkpoint path train.init_checkpoint = "/path/to/swin_large_patch4_window7_224_22kto1k.pth" ```

Using Swin-Large-384 Backbone in Config

```python from detectron2.config import LazyCall as L from detectron2.modeling.backbone import SwinTransformer # modify backbone config model.backbone = L(SwinTransformer)( pretrain_img_size=384, embed_dim=192, depths=(2, 2, 18, 2), num_heads=(6, 12, 24, 48), window_size=12, out_indices=(1, 2, 3), ) # setup init checkpoint path train.init_checkpoint = "/path/to/swin_large_patch4_window12_384_22kto1k.pth" ```

## ViTDet Here we borrowed the download links from the [official implementation](https://github.com/facebookresearch/mae#fine-tuning-with-pre-trained-checkpoints) of MAE.

	ViT-Base	ViT-Large	ViT-Huge
Pretrained Checkpoint	download	download	download

Using ViTDet Backbone in Config

```python import torch.nn as nn from detectron2.config import LazyCall as L from detectron2.layers import ShapeSpec from detectron2.modeling import ViT, SimpleFeaturePyramid from detectron2.modeling.backbone.fpn import LastLevelMaxPool from .dino_r50 import model # ViT Base Hyper-params embed_dim, depth, num_heads, dp = 768, 12, 12, 0.1 # Creates Simple Feature Pyramid from ViT backbone model.backbone = L(SimpleFeaturePyramid)( net=L(ViT)( # Single-scale ViT backbone img_size=1024, patch_size=16, embed_dim=embed_dim, depth=depth, num_heads=num_heads, drop_path_rate=dp, window_size=14, mlp_ratio=4, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), window_block_indexes=[ # 2, 5, 8 11 for global attention 0, 1, 3, 4, 6, 7, 9, 10, ], residual_block_indexes=[], use_rel_pos=True, out_feature="last_feat", ), in_feature="${.net.out_feature}", out_channels=256, scale_factors=(2.0, 1.0, 0.5), # (4.0, 2.0, 1.0, 0.5) in ViTDet top_block=L(LastLevelMaxPool)(), norm="LN", square_pad=1024, ) # setup init checkpoint path train.init_checkpoint = "/path/to/mae_pretrain_vit_base.pth" ```

Please refer to [DINO](https://github.com/IDEA-Research/detrex/tree/main/projects/dino) project for more details about the usage of vit backbone. ## FocalNet Here we borrowed the download links from the [official implementation](https://github.com/microsoft/FocalNet#imagenet-22k-pretrained) of FocalNet.

Model	Depth	Dim	Kernels	#Params. (M)	Download
FocalNet-L	[2, 2, 18, 2]	192	[5, 7, 9]	207	download
FocalNet-L	[2, 2, 18, 2]	192	[3, 5, 7, 9]	207	download
FocalNet-XL	[2, 2, 18, 2]	256	[5, 7, 9]	366	download
FocalNet-XL	[2, 2, 18, 2]	256	[3, 5, 7, 9]	207	download
FocalNet-H	[2, 2, 18, 2]	352	[5, 7, 9]	687	download
FocalNet-H	[2, 2, 18, 2]	352	[3, 5, 7, 9]	687	download

Using FocalNet Backbone in Config

```python # focalnet-large-4scale baseline model.backbone = L(FocalNet)( embed_dim=192, depths=(2, 2, 18, 2), focal_levels=(3, 3, 3, 3), focal_windows=(5, 5, 5, 5), use_conv_embed=True, use_postln=True, use_postln_in_modulation=False, use_layerscale=True, normalize_modulator=False, out_indices=(1, 2, 3), ) ```