| # Tutorial 1: Learn about Configs | |
| We incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments. | |
| If you wish to inspect the config file, you may run `python tools/print_config.py /PATH/TO/CONFIG` to see the complete config. | |
| You may also pass `--options xxx.yyy=zzz` to see updated config. | |
| ## Config File Structure | |
| There are 4 basic component types under `config/_base_`, dataset, model, schedule, default_runtime. | |
| Many methods could be easily constructed with one of each like DeepLabV3, PSPNet. | |
| The configs that are composed by components from `_base_` are called _primitive_. | |
| For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3. | |
| For easy understanding, we recommend contributors to inherit from exiting methods. | |
| For example, if some modification is made base on DeepLabV3, user may first inherit the basic DeepLabV3 structure by specifying `_base_ = ../deeplabv3/deeplabv3_r50_512x1024_40ki_cityscapes.py`, then modify the necessary fields in the config files. | |
| If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxxnet` under `configs`, | |
| Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#config) for detailed documentation. | |
| ## Config Name Style | |
| We follow the below style to name config files. Contributors are advised to follow the same style. | |
| ``` | |
| {model}_{backbone}_[misc]_[gpu x batch_per_gpu]_{resolution}_{schedule}_{dataset} | |
| ``` | |
| `{xxx}` is required field and `[yyy]` is optional. | |
| - `{model}`: model type like `psp`, `deeplabv3`, etc. | |
| - `{backbone}`: backbone type like `r50` (ResNet-50), `x101` (ResNeXt-101). | |
| - `[misc]`: miscellaneous setting/plugins of model, e.g. `dconv`, `gcb`, `attention`, `mstrain`. | |
| - `[gpu x batch_per_gpu]`: GPUs and samples per GPU, `8x2` is used by default. | |
| - `{schedule}`: training schedule, `20ki` means 20k iterations. | |
| - `{dataset}`: dataset like `cityscapes`, `voc12aug`, `ade`. | |
| ## An Example of PSPNet | |
| To help the users have a basic idea of a complete config and the modules in a modern semantic segmentation system, | |
| we make brief comments on the config of PSPNet using ResNet50V1c as the following. | |
| For more detailed usage and the corresponding alternative for each modules, please refer to the API documentation. | |
| ```python | |
| norm_cfg = dict(type='SyncBN', requires_grad=True) # Segmentation usually uses SyncBN | |
| model = dict( | |
| type='EncoderDecoder', # Name of segmentor | |
| pretrained='open-mmlab://resnet50_v1c', # The ImageNet pretrained backbone to be loaded | |
| backbone=dict( | |
| type='ResNetV1c', # The type of backbone. Please refer to mmseg/backbone/resnet.py for details. | |
| depth=50, # Depth of backbone. Normally 50, 101 are used. | |
| num_stages=4, # Number of stages of backbone. | |
| out_indices=(0, 1, 2, 3), # The index of output feature maps produced in each stages. | |
| dilations=(1, 1, 2, 4), # The dilation rate of each layer. | |
| strides=(1, 2, 1, 1), # The stride of each layer. | |
| norm_cfg=dict( # The configuration of norm layer. | |
| type='SyncBN', # Type of norm layer. Usually it is SyncBN. | |
| requires_grad=True), # Whether to train the gamma and beta in norm | |
| norm_eval=False, # Whether to freeze the statistics in BN | |
| style='pytorch', # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 conv, 'caffe' means stride 2 layers are in 1x1 convs. | |
| contract_dilation=True), # When dilation > 1, whether contract first layer of dilation. | |
| decode_head=dict( | |
| type='PSPHead', # Type of decode head. Please refer to mmseg/models/decode_heads for available options. | |
| in_channels=2048, # Input channel of decode head. | |
| in_index=3, # The index of feature map to select. | |
| channels=512, # The intermediate channels of decode head. | |
| pool_scales=(1, 2, 3, 6), # The avg pooling scales of PSPHead. Please refer to paper for details. | |
| dropout_ratio=0.1, # The dropout ratio before final classification layer. | |
| num_classes=19, # Number of segmentation classs. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k. | |
| norm_cfg=dict(type='SyncBN', requires_grad=True), # The configuration of norm layer. | |
| align_corners=False, # The align_corners argument for resize in decoding. | |
| loss_decode=dict( # Config of loss function for the decode_head. | |
| type='CrossEntropyLoss', # Type of loss used for segmentation. | |
| use_sigmoid=False, # Whether use sigmoid activation for segmentation. | |
| loss_weight=1.0)), # Loss weight of decode head. | |
| auxiliary_head=dict( | |
| type='FCNHead', # Type of auxiliary head. Please refer to mmseg/models/decode_heads for available options. | |
| in_channels=1024, # Input channel of auxiliary head. | |
| in_index=2, # The index of feature map to select. | |
| channels=256, # The intermediate channels of decode head. | |
| num_convs=1, # Number of convs in FCNHead. It is usually 1 in auxiliary head. | |
| concat_input=False, # Whether concat output of convs with input before classification layer. | |
| dropout_ratio=0.1, # The dropout ratio before final classification layer. | |
| num_classes=19, # Number of segmentation classs. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k. | |
| norm_cfg=dict(type='SyncBN', requires_grad=True), # The configuration of norm layer. | |
| align_corners=False, # The align_corners argument for resize in decoding. | |
| loss_decode=dict( # Config of loss function for the decode_head. | |
| type='CrossEntropyLoss', # Type of loss used for segmentation. | |
| use_sigmoid=False, # Whether use sigmoid activation for segmentation. | |
| loss_weight=0.4))) # Loss weight of auxiliary head, which is usually 0.4 of decode head. | |
| train_cfg = dict() # train_cfg is just a place holder for now. | |
| test_cfg = dict(mode='whole') # The test mode, options are 'whole' and 'sliding'. 'whole': whole image fully-convolutional test. 'sliding': sliding crop window on the image. | |
| dataset_type = 'CityscapesDataset' # Dataset type, this will be used to define the dataset. | |
| data_root = 'data/cityscapes/' # Root path of data. | |
| img_norm_cfg = dict( # Image normalization config to normalize the input images. | |
| mean=[123.675, 116.28, 103.53], # Mean values used to pre-training the pre-trained backbone models. | |
| std=[58.395, 57.12, 57.375], # Standard variance used to pre-training the pre-trained backbone models. | |
| to_rgb=True) # The channel orders of image used to pre-training the pre-trained backbone models. | |
| crop_size = (512, 1024) # The crop size during training. | |
| train_pipeline = [ # Training pipeline. | |
| dict(type='LoadImageFromFile'), # First pipeline to load images from file path. | |
| dict(type='LoadAnnotations'), # Second pipeline to load annotations for current image. | |
| dict(type='Resize', # Augmentation pipeline that resize the images and their annotations. | |
| img_scale=(2048, 1024), # The largest scale of image. | |
| ratio_range=(0.5, 2.0)), # The augmented scale range as ratio. | |
| dict(type='RandomCrop', # Augmentation pipeline that randomly crop a patch from current image. | |
| crop_size=(512, 1024), # The crop size of patch. | |
| cat_max_ratio=0.75), # The max area ratio that could be occupied by single category. | |
| dict( | |
| type='RandomFlip', # Augmentation pipeline that flip the images and their annotations | |
| flip_ratio=0.5), # The ratio or probability to flip | |
| dict(type='PhotoMetricDistortion'), # Augmentation pipeline that distort current image with several photo metric methods. | |
| dict( | |
| type='Normalize', # Augmentation pipeline that normalize the input images | |
| mean=[123.675, 116.28, 103.53], # These keys are the same of img_norm_cfg since the | |
| std=[58.395, 57.12, 57.375], # keys of img_norm_cfg are used here as arguments | |
| to_rgb=True), | |
| dict(type='Pad', # Augmentation pipeline that pad the image to specified size. | |
| size=(512, 1024), # The output size of padding. | |
| pad_val=0, # The padding value for image. | |
| seg_pad_val=255), # The padding value of 'gt_semantic_seg'. | |
| dict(type='DefaultFormatBundle'), # Default format bundle to gather data in the pipeline | |
| dict(type='Collect', # Pipeline that decides which keys in the data should be passed to the segmentor | |
| keys=['img', 'gt_semantic_seg']) | |
| ] | |
| test_pipeline = [ | |
| dict(type='LoadImageFromFile'), # First pipeline to load images from file path | |
| dict( | |
| type='MultiScaleFlipAug', # An encapsulation that encapsulates the test time augmentations | |
| img_scale=(2048, 1024), # Decides the largest scale for testing, used for the Resize pipeline | |
| flip=False, # Whether to flip images during testing | |
| transforms=[ | |
| dict(type='Resize', # Use resize augmentation | |
| keep_ratio=True), # Whether to keep the ratio between height and width, the img_scale set here will be supressed by the img_scale set above. | |
| dict(type='RandomFlip'), # Thought RandomFlip is added in pipeline, it is not used when flip=False | |
| dict( | |
| type='Normalize', # Normalization config, the values are from img_norm_cfg | |
| mean=[123.675, 116.28, 103.53], | |
| std=[58.395, 57.12, 57.375], | |
| to_rgb=True), | |
| dict(type='ImageToTensor', # Convert image to tensor | |
| keys=['img']), | |
| dict(type='Collect', # Collect pipeline that collect necessary keys for testing. | |
| keys=['img']) | |
| ]) | |
| ] | |
| data = dict( | |
| samples_per_gpu=2, # Batch size of a single GPU | |
| workers_per_gpu=2, # Worker to pre-fetch data for each single GPU | |
| train=dict( # Train dataset config | |
| type='CityscapesDataset', # Type of dataset, refer to mmseg/datasets/ for details. | |
| data_root='data/cityscapes/', # The root of dataset. | |
| img_dir='leftImg8bit/train', # The image directory of dataset. | |
| ann_dir='gtFine/train', # The annotation directory of dataset. | |
| pipeline=[ # pipeline, this is passed by the train_pipeline created before. | |
| dict(type='LoadImageFromFile'), | |
| dict(type='LoadAnnotations'), | |
| dict( | |
| type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), | |
| dict(type='RandomCrop', crop_size=(512, 1024), cat_max_ratio=0.75), | |
| dict(type='RandomFlip', flip_ratio=0.5), | |
| dict(type='PhotoMetricDistortion'), | |
| dict( | |
| type='Normalize', | |
| mean=[123.675, 116.28, 103.53], | |
| std=[58.395, 57.12, 57.375], | |
| to_rgb=True), | |
| dict(type='Pad', size=(512, 1024), pad_val=0, seg_pad_val=255), | |
| dict(type='DefaultFormatBundle'), | |
| dict(type='Collect', keys=['img', 'gt_semantic_seg']) | |
| ]), | |
| val=dict( # Validation dataset config | |
| type='CityscapesDataset', | |
| data_root='data/cityscapes/', | |
| img_dir='leftImg8bit/val', | |
| ann_dir='gtFine/val', | |
| pipeline=[ # Pipeline is passed by test_pipeline created before | |
| dict(type='LoadImageFromFile'), | |
| dict( | |
| type='MultiScaleFlipAug', | |
| img_scale=(2048, 1024), | |
| flip=False, | |
| transforms=[ | |
| dict(type='Resize', keep_ratio=True), | |
| dict(type='RandomFlip'), | |
| dict( | |
| type='Normalize', | |
| mean=[123.675, 116.28, 103.53], | |
| std=[58.395, 57.12, 57.375], | |
| to_rgb=True), | |
| dict(type='ImageToTensor', keys=['img']), | |
| dict(type='Collect', keys=['img']) | |
| ]) | |
| ]), | |
| test=dict( | |
| type='CityscapesDataset', | |
| data_root='data/cityscapes/', | |
| img_dir='leftImg8bit/val', | |
| ann_dir='gtFine/val', | |
| pipeline=[ | |
| dict(type='LoadImageFromFile'), | |
| dict( | |
| type='MultiScaleFlipAug', | |
| img_scale=(2048, 1024), | |
| flip=False, | |
| transforms=[ | |
| dict(type='Resize', keep_ratio=True), | |
| dict(type='RandomFlip'), | |
| dict( | |
| type='Normalize', | |
| mean=[123.675, 116.28, 103.53], | |
| std=[58.395, 57.12, 57.375], | |
| to_rgb=True), | |
| dict(type='ImageToTensor', keys=['img']), | |
| dict(type='Collect', keys=['img']) | |
| ]) | |
| ])) | |
| log_config = dict( # config to register logger hook | |
| interval=50, # Interval to print the log | |
| hooks=[ | |
| # dict(type='TensorboardLoggerHook') # The Tensorboard logger is also supported | |
| dict(type='TextLoggerHook', by_epoch=False) | |
| ]) | |
| dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set. | |
| log_level = 'INFO' # The level of logging. | |
| load_from = None # load models as a pre-trained model from a given path. This will not resume training. | |
| resume_from = None # Resume checkpoints from a given path, the training will be resumed from the iteration when the checkpoint's is saved. | |
| workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 40000 iterations according to the `runner.max_iters`. | |
| cudnn_benchmark = True # Whether use cudnn_benchmark to speed up, which is fast for fixed input size. | |
| optimizer = dict( # Config used to build optimizer, support all the optimizers in PyTorch whose arguments are also the same as those in PyTorch | |
| type='SGD', # Type of optimizers, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details | |
| lr=0.01, # Learning rate of optimizers, see detail usages of the parameters in the documentation of PyTorch | |
| momentum=0.9, # Momentum | |
| weight_decay=0.0005) # Weight decay of SGD | |
| optimizer_config = dict() # Config used to build the optimizer hook, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py#L8 for implementation details. | |
| lr_config = dict( | |
| policy='poly', # The policy of scheduler, also support Step, CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9. | |
| power=0.9, # The power of polynomial decay. | |
| min_lr=0.0001, # The minimum learning rate to stable the training. | |
| by_epoch=False) # Whethe count by epoch or not. | |
| runner = dict( | |
| type='IterBasedRunner', # Type of runner to use (i.e. IterBasedRunner or EpochBasedRunner) | |
| max_iters=40000) # Total number of iterations. For EpochBasedRunner use `max_epochs` | |
| checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation. | |
| by_epoch=False, # Whethe count by epoch or not. | |
| interval=4000) # The save interval. | |
| evaluation = dict( # The config to build the evaluation hook. Please refer to mmseg/core/evaulation/eval_hook.py for details. | |
| interval=4000, # The interval of evaluation. | |
| metric='mIoU') # The evaluation metric. | |
| ``` | |
| ## FAQ | |
| ### Ignore some fields in the base configs | |
| Sometimes, you may set `_delete_=True` to ignore some of fields in base configs. | |
| You may refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#inherit-from-base-config-with-ignored-fields) for simple inllustration. | |
| In MMSegmentation, for example, to change the backbone of PSPNet with the following config. | |
| ```python | |
| norm_cfg = dict(type='SyncBN', requires_grad=True) | |
| model = dict( | |
| type='MaskRCNN', | |
| pretrained='torchvision://resnet50', | |
| backbone=dict( | |
| type='ResNetV1c', | |
| depth=50, | |
| num_stages=4, | |
| out_indices=(0, 1, 2, 3), | |
| dilations=(1, 1, 2, 4), | |
| strides=(1, 2, 1, 1), | |
| norm_cfg=norm_cfg, | |
| norm_eval=False, | |
| style='pytorch', | |
| contract_dilation=True), | |
| decode_head=dict(...), | |
| auxiliary_head=dict(...)) | |
| ``` | |
| `ResNet` and `HRNet` use different keywords to construct. | |
| ```python | |
| _base_ = '../pspnet/psp_r50_512x1024_40ki_cityscpaes.py' | |
| norm_cfg = dict(type='SyncBN', requires_grad=True) | |
| model = dict( | |
| pretrained='open-mmlab://msra/hrnetv2_w32', | |
| backbone=dict( | |
| _delete_=True, | |
| type='HRNet', | |
| norm_cfg=norm_cfg, | |
| extra=dict( | |
| stage1=dict( | |
| num_modules=1, | |
| num_branches=1, | |
| block='BOTTLENECK', | |
| num_blocks=(4, ), | |
| num_channels=(64, )), | |
| stage2=dict( | |
| num_modules=1, | |
| num_branches=2, | |
| block='BASIC', | |
| num_blocks=(4, 4), | |
| num_channels=(32, 64)), | |
| stage3=dict( | |
| num_modules=4, | |
| num_branches=3, | |
| block='BASIC', | |
| num_blocks=(4, 4, 4), | |
| num_channels=(32, 64, 128)), | |
| stage4=dict( | |
| num_modules=3, | |
| num_branches=4, | |
| block='BASIC', | |
| num_blocks=(4, 4, 4, 4), | |
| num_channels=(32, 64, 128, 256)))), | |
| decode_head=dict(...), | |
| auxiliary_head=dict(...)) | |
| ``` | |
| The `_delete_=True` would replace all old keys in `backbone` field with new keys new keys. | |
| ### Use intermediate variables in configs | |
| Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets. | |
| It's worth noting that when modifying intermediate variables in the children configs, user need to pass the intermediate variables into corresponding fields again. | |
| For example, we would like to change multi scale strategy to train/test a PSPNet. `train_pipeline`/`test_pipeline` are intermediate variable we would like modify. | |
| ```python | |
| _base_ = '../pspnet/psp_r50_512x1024_40ki_cityscapes.py' | |
| crop_size = (512, 1024) | |
| img_norm_cfg = dict( | |
| mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) | |
| train_pipeline = [ | |
| dict(type='LoadImageFromFile'), | |
| dict(type='LoadAnnotations'), | |
| dict(type='Resize', img_scale=(2048, 1024), ratio_range=(1.0, 2.0)), # change to [1., 2.] | |
| dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), | |
| dict(type='RandomFlip', flip_ratio=0.5), | |
| dict(type='PhotoMetricDistortion'), | |
| dict(type='Normalize', **img_norm_cfg), | |
| dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), | |
| dict(type='DefaultFormatBundle'), | |
| dict(type='Collect', keys=['img', 'gt_semantic_seg']), | |
| ] | |
| test_pipeline = [ | |
| dict(type='LoadImageFromFile'), | |
| dict( | |
| type='MultiScaleFlipAug', | |
| img_scale=(2048, 1024), | |
| img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], # change to multi scale testing | |
| flip=False, | |
| transforms=[ | |
| dict(type='Resize', keep_ratio=True), | |
| dict(type='RandomFlip'), | |
| dict(type='Normalize', **img_norm_cfg), | |
| dict(type='ImageToTensor', keys=['img']), | |
| dict(type='Collect', keys=['img']), | |
| ]) | |
| ] | |
| data = dict( | |
| train=dict(pipeline=train_pipeline), | |
| val=dict(pipeline=test_pipeline), | |
| test=dict(pipeline=test_pipeline)) | |
| ``` | |
| We first define the new `train_pipeline`/`test_pipeline` and pass them into `data`. | |
| Similarly, if we would like to switch from `SyncBN` to `BN` or `MMSyncBN`, we need to substitute every `norm_cfg` in the config. | |
| ```python | |
| _base_ = '../pspnet/psp_r50_512x1024_40ki_cityscpaes.py' | |
| norm_cfg = dict(type='BN', requires_grad=True) | |
| model = dict( | |
| backbone=dict(norm_cfg=norm_cfg), | |
| decode_head=dict(norm_cfg=norm_cfg), | |
| auxiliary_head=dict(norm_cfg=norm_cfg)) | |
| ``` | |