# Convert Pretrained Models This document provides a brief intro of how to convert the pretrained model into the format of detrex. ## Convert TorchVision Pretrained ResNet Models To use the detectron2 provided pretrained weights, please refer to [ImageNet Pretrained Models](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md#imagenet-pretrained-models). Here we've noticed that detectron2 only provided a converted torchvision `ResNet-50` model. For more pretrained models like `ResNet{101, 152}`. You can use the detectron2 provided [conversion script](https://github.com/facebookresearch/detectron2/blob/main/tools/convert-torchvision-to-d2.py) to convert the torchvision pretrained weights into the format that can be used in `detrex`. Here's the detailed tutorial about the usage the conversion script. ### Download Pretrained Weights `Torchvision 0.11.0` was released packed with better pretrained weights on numerous models including `ResNet`. More details can be found in [How to Train State-Of-The-Art Models Using TorchVision’s Latest Primitives](https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/), here we collected the download scripts for TorchVision `ResNet` models.
Name Download Pretrain Acc@1 Acc@5
ResNet-50 (ImageNet1k-V1)
script
 wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O r50_v1.pth
IN1k 76.130 92.862
ResNet-50 (ImageNet1k-V2)
script
 wget https://download.pytorch.org/models/resnet50-11ad3fa6.pth -O r50_v2.pth
IN1k 80.858 95.434
ResNet-101 (ImageNet1k-V1)
script
 wget https://download.pytorch.org/models/resnet101-63fe2227.pth -O r101_v1.pth
IN1k 77.374 93.546
ResNet-101 (ImageNet1k-V2)
script
 wget https://download.pytorch.org/models/resnet101-cd907fc2.pth -O r101_v2.pth
IN1k 81.886 95.780
ResNet-152 (ImageNet1k-V1)
script
 wget https://download.pytorch.org/models/resnet152-394f9c45.pth -O r152_v1.pth
IN1k 78.312 94.046
ResNet-152 (ImageNet1k-V2)
script
 wget https://download.pytorch.org/models/resnet152-f82ba261.pth -O r152_v2.pth
IN1k 82.284 96.002
**Note:** `ImageNet1k-V1` means the old pretrained weights. `ImageNet1k-V2` means the improved baseline results. ### Run the Conversion
convert-torchvision-to-d2 (borrowed from detectron2) ```python #!/usr/bin/env python # Copyright (c) Facebook, Inc. and its affiliates. import pickle as pkl import sys import torch """ Usage: # download one of the ResNet{18,34,50,101,152} models from torchvision: wget https://download.pytorch.org/models/resnet50-19c8e357.pth -O r50.pth # run the conversion ./convert-torchvision-to-d2.py r50.pth r50.pkl # Then, use r50.pkl with the following changes in config: MODEL: WEIGHTS: "/path/to/r50.pkl" PIXEL_MEAN: [123.675, 116.280, 103.530] PIXEL_STD: [58.395, 57.120, 57.375] RESNETS: DEPTH: 50 STRIDE_IN_1X1: False INPUT: FORMAT: "RGB" These models typically produce slightly worse results than the pre-trained ResNets we use in official configs, which are the original ResNet models released by MSRA. """ if __name__ == "__main__": input = sys.argv[1] obj = torch.load(input, map_location="cpu") newmodel = {} for k in list(obj.keys()): old_k = k if "layer" not in k: k = "stem." + k for t in [1, 2, 3, 4]: k = k.replace("layer{}".format(t), "res{}".format(t + 1)) for t in [1, 2, 3]: k = k.replace("bn{}".format(t), "conv{}.norm".format(t)) k = k.replace("downsample.0", "shortcut") k = k.replace("downsample.1", "shortcut.norm") print(old_k, "->", k) newmodel[k] = obj.pop(old_k).detach().numpy() res = {"model": newmodel, "__author__": "torchvision", "matching_heuristics": True} with open(sys.argv[2], "wb") as f: pkl.dump(res, f) if obj: print("Unconverted keys:", obj.keys()) ```

Firstly, create `convert-torchvision-to-d2.py` and copy the relative code mentioned above, then run: ```bash python convert-torchvision-to-d2.py \ /path/to/r101_v1.pth \ # path to the downloaded pretrained weights ./r101_v1.pkl # where to save the converted weights ``` Then, change the training configs: ```bash # your own config.py train.init_checkpoint = "path/to/r101_v1.pkl" # make sure that the model config is consistent # with the following settings model.backbone.stages.depth = 101 model.pixel_mean = [123.675, 116.280, 103.530] model.pixel_std = [58.395, 57.120, 57.375] ``` ## Convert DETRs Pretrained Models We also provides converters for a partial of projects in detrex. These conversions are modified from the [detr-d2 conversion script](https://github.com/facebookresearch/detr/blob/main/d2/converter.py) to convert models trained by the original repo into the format of detrex models. - converter for DETR: [convert_detr_to_detrex](https://github.com/IDEA-Research/detrex/blob/main/projects/detr/converter.py) - converter for Deformable-DETR: [convert_deformable_detr_to_detrex](https://github.com/IDEA-Research/detrex/blob/main/projects/deformable_detr/converter.py) - converter for ConditionalDETR: [convert_conditional_detr_to_detrex](https://github.com/IDEA-Research/detrex/blob/main/projects/conditional_detr/converter.py) - converter for DN-Deformable-DETR: [convert_dn_deformable_detr_to_detrex](https://github.com/IDEA-Research/detrex/blob/main/projects/dn_deformable_detr/converter.py) All these converters can be runned as: ```python python converter.py --source_model /path/to/pretrained_weight.pth --output_model converted_model.pth ```