File size: 1,651 Bytes
352cafd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
## Advanced Usage of Mask2Former
This document provides a brief intro of the advanced usage of Mask2Former for research purpose.
Mask2Former is highly modulized, it consists of three components: a backbone, a pixel decoder and a Transformer decoder.
You can easily replace each of these three components with your own implementation.
### Test Mask2Former with your own backbone
1. Define and register your backbone under `mask2former/modeling/backbone`. You can follow the Swin Transformer as an example.
2. Change the config file accordingly.
### Test Mask2Former with your own pixel decoder
1. Define and register your pixel decoder under `mask2former/modeling/pixel_decoder`.
2. Change the config file accordingly.
Note that, your pixel decoder must have a `self.forward_features(features)` methods that returns three values:
1. `mask_features`, which is the per-pixel embeddings with resolution 1/4 of the original image. This is used to produce binary masks.
2. `None`, you can simply return `None` for the second value.
3. `multi_scale_features`, which is the multi-scale inputs to the Transformer decoder. This must be a list with length 3.
We use resolution 1/32, 1/16, and 1/8 but you can use arbitrary resolutions here.
Example config to use a Transformer-encoder enhanced FPN instead of MSDeformAttn:
```
MODEL:
SEM_SEG_HEAD:
# pixel decoder
PIXEL_DECODER_NAME: "TransformerEncoderPixelDecoder"
IN_FEATURES: ["res2", "res3", "res4", "res5"]
COMMON_STRIDE: 4
TRANSFORMER_ENC_LAYERS: 6
```
### Build a new Transformer decoder.
Transformer decoders are defined under `mask2former/modeling/transformer_decoder`.
|