# Download Pretrained Backbone Weights
Here we collect the **links** of the backbone models which makes it easier for users to **download pretrained weights** for the **builtin backbones**. And this document will be kept updated. Most included models are borrowed from their original sources. Many thanks for their nicely work in the backbone area.
## ResNet
We've already provided the tutorials of **using torchvision pretrained ResNet models** here: [Download TorchVision ResNet Models](https://detrex.readthedocs.io/en/latest/tutorials/Converters.html#download-pretrained-weights).
## Swin-Transformer
Here we borrowed the download links from the [official implementation](https://github.com/microsoft/Swin-Transformer#main-results-on-imagenet-with-pretrained-models) of Swin-Transformer.
### Swin-Tiny
| Name |
Pretrain |
Resolution |
Acc@1 |
Acc@5 |
22K Model |
1K Model |
| Swin-Tiny |
ImageNet-1K |
224x224 |
81.2 |
95.5 |
- |
download |
| Swin-Tiny |
ImageNet-22K |
224x224 |
80.9 |
96.0 |
download |
download |
Using Swin-Tiny Backbone in Config
```python
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=224,
embed_dim=96,
depths=(2, 2, 6, 2),
num_heads=(3, 6, 12, 24),
drop_path_rate=0.1,
window_size=7,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224_22kto1k_finetune.pth"
```
### Swin-Small
| Name |
Pretrain |
Resolution |
Acc@1 |
Acc@5 |
22K Model |
1K Model |
| Swin-Small |
ImageNet-1K |
224x224 |
83.2 |
96.2 |
- |
download |
| Swin-Small |
ImageNet-22K |
224x224 |
83.2 |
97.0 |
download |
download |
Using Swin-Small Backbone in Config
```python
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=224,
embed_dim=96,
depths=(2, 2, 18, 2),
num_heads=(3, 6, 12, 24),
drop_path_rate=0.2,
window_size=7,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_small_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_small_patch4_window7_224_22kto1k_finetune.pth"
```
### Swin-Base
| Name |
Pretrain |
Resolution |
Acc@1 |
Acc@5 |
22K Model |
1K Model |
| Swin-Base |
ImageNet-1K |
224x224 |
83.5 |
96.5 |
- |
download |
| Swin-Base |
ImageNet-1K |
384x384 |
84.5 |
97.0 |
- |
download |
| Swin-Base |
ImageNet-22K |
224x224 |
85.2 |
97.5 |
download |
download |
| Swin-Base |
ImageNet-22K |
384x384 |
86.4 |
98.0 |
download |
download |
Using Swin-Base-224 Backbone in Config
```python
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=224,
embed_dim=128,
depths=(2, 2, 18, 2),
num_heads=(4, 8, 16, 32),
window_size=7,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_base_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_base_patch4_window7_224_22kto1k.pth"
```
Using Swin-Base-384 Backbone in Config
```python
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=384,
embed_dim=128,
depths=(2, 2, 18, 2),
num_heads=(4, 8, 16, 32),
window_size=12,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_base_patch4_window12_384.pth"
train.init_checkpoint = "/path/to/swin_base_patch4_window12_384_22kto1k.pth"
```
### Swin-Large
| Name |
Pretrain |
Resolution |
Acc@1 |
Acc@5 |
22K Model |
1K Model |
| Swin-Large |
ImageNet-22K |
224x224 |
86.3 |
97.9 |
download |
download |
| Swin-Large |
ImageNet-22K |
384x384 |
87.3 |
98.2 |
download |
download |
Using Swin-Large-224 Backbone in Config
```python
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=224,
embed_dim=192,
depths=(2, 2, 18, 2),
num_heads=(6, 12, 24, 48),
window_size=7,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
train.init_checkpoint = "/path/to/swin_large_patch4_window7_224_22kto1k.pth"
```
Using Swin-Large-384 Backbone in Config
```python
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=384,
embed_dim=192,
depths=(2, 2, 18, 2),
num_heads=(6, 12, 24, 48),
window_size=12,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
train.init_checkpoint = "/path/to/swin_large_patch4_window12_384_22kto1k.pth"
```
## ViTDet
Here we borrowed the download links from the [official implementation](https://github.com/facebookresearch/mae#fine-tuning-with-pre-trained-checkpoints) of MAE.
Using ViTDet Backbone in Config
```python
import torch.nn as nn
from detectron2.config import LazyCall as L
from detectron2.layers import ShapeSpec
from detectron2.modeling import ViT, SimpleFeaturePyramid
from detectron2.modeling.backbone.fpn import LastLevelMaxPool
from .dino_r50 import model
# ViT Base Hyper-params
embed_dim, depth, num_heads, dp = 768, 12, 12, 0.1
# Creates Simple Feature Pyramid from ViT backbone
model.backbone = L(SimpleFeaturePyramid)(
net=L(ViT)( # Single-scale ViT backbone
img_size=1024,
patch_size=16,
embed_dim=embed_dim,
depth=depth,
num_heads=num_heads,
drop_path_rate=dp,
window_size=14,
mlp_ratio=4,
qkv_bias=True,
norm_layer=partial(nn.LayerNorm, eps=1e-6),
window_block_indexes=[
# 2, 5, 8 11 for global attention
0,
1,
3,
4,
6,
7,
9,
10,
],
residual_block_indexes=[],
use_rel_pos=True,
out_feature="last_feat",
),
in_feature="${.net.out_feature}",
out_channels=256,
scale_factors=(2.0, 1.0, 0.5), # (4.0, 2.0, 1.0, 0.5) in ViTDet
top_block=L(LastLevelMaxPool)(),
norm="LN",
square_pad=1024,
)
# setup init checkpoint path
train.init_checkpoint = "/path/to/mae_pretrain_vit_base.pth"
```
Please refer to [DINO](https://github.com/IDEA-Research/detrex/tree/main/projects/dino) project for more details about the usage of vit backbone.
## FocalNet
Here we borrowed the download links from the [official implementation](https://github.com/microsoft/FocalNet#imagenet-22k-pretrained) of FocalNet.
| Model |
Depth |
Dim |
Kernels |
#Params. (M) |
Download |
| FocalNet-L |
[2, 2, 18, 2] |
192 |
[5, 7, 9] |
207 |
download |
| FocalNet-L |
[2, 2, 18, 2] |
192 |
[3, 5, 7, 9] |
207 |
download |
| FocalNet-XL |
[2, 2, 18, 2] |
256 |
[5, 7, 9] |
366 |
download |
| FocalNet-XL |
[2, 2, 18, 2] |
256 |
[3, 5, 7, 9] |
207 |
download |
| FocalNet-H |
[2, 2, 18, 2] |
352 |
[5, 7, 9] |
687 |
download |
| FocalNet-H |
[2, 2, 18, 2] |
352 |
[3, 5, 7, 9] |
687 |
download |
Using FocalNet Backbone in Config
```python
# focalnet-large-4scale baseline
model.backbone = L(FocalNet)(
embed_dim=192,
depths=(2, 2, 18, 2),
focal_levels=(3, 3, 3, 3),
focal_windows=(5, 5, 5, 5),
use_conv_embed=True,
use_postln=True,
use_postln_in_modulation=False,
use_layerscale=True,
normalize_modulator=False,
out_indices=(1, 2, 3),
)
```