File size: 43,157 Bytes
d670799
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
# Learn about Configs

We use python files as configs, incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments.
You can find all the provided configs under `$MMAction2/configs`. If you wish to inspect the config file,
you may run `python tools/analysis_tools/print_config.py /PATH/TO/CONFIG` to see the complete config.

<!-- TOC -->

- [Learn about Configs](#learn-about-configs)
  - [Modify config through script arguments](#modify-config-through-script-arguments)
  - [Config File Structure](#config-file-structure)
  - [Config File Naming Convention](#config-file-naming-convention)
    - [Config System for Action Recognition](#config-system-for-action-recognition)
    - [Config System for Spatio-Temporal Action Detection](#config-system-for-spatio-temporal-action-detection)
    - [Config System for Action localization](#config-system-for-action-localization)

<!-- TOC -->

## Modify config through script arguments

When submitting jobs using `tools/train.py` or `tools/test.py`, you may specify `--cfg-options` to in-place modify the config.

- Update config keys of dict.

  The config options can be specified following the order of the dict keys in the original config.
  For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.

- Update keys inside a list of configs.

  Some config dicts are composed as a list in your config. For example, the training pipeline `train_pipeline` is normally a list
  e.g. `[dict(type='SampleFrames'), ...]`. If you want to change `'SampleFrames'` to `'DenseSampleFrames'` in the pipeline,
  you may specify `--cfg-options train_pipeline.0.type=DenseSampleFrames`.

- Update values of list/tuples.

  If the value to be updated is a list or a tuple. For example, the config file normally sets `model.data_preprocessor.mean=[123.675, 116.28, 103.53]`. If you want to
  change this key, you may specify `--cfg-options model.data_preprocessor.mean="[128,128,128]"`. Note that the quotation mark " is necessary to support list/tuple data types.

## Config File Structure

There are 3 basic component types under `configs/_base_`, models, schedules, default_runtime.

Many methods could be easily constructed with one of each like TSN, I3D, SlowOnly, etc.

The configs that are composed by components from `_base_` are called _primitive_.



For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.

For easy understanding, we recommend contributors to inherit from exiting methods.
For example, if some modification is made based on TSN, users may first inherit the basic TSN structure by specifying `_base_ = ../tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb.py`, then modify the necessary fields in the config files.

If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder under `configs/TASK`.

Please refer to [mmengine](https://mmengine.readthedocs.io/en/latest/tutorials/config.html) for detailed documentation.

## Config File Naming Convention

We follow the style below to name config files. Contributors are advised to follow the same style. The config file names are divided into several parts. Logically, different parts are concatenated by underscores `'_'`, and settings in the same part are concatenated by dashes `'-'`.

```

{algorithm info}_{module info}_{training info}_{data info}.py

```

`{xxx}` is required field and `[yyy]` is optional.

- `{algorithm info}`:
  - `{model}`: model type, e.g. `tsn`, `i3d`, `swin`, `vit`, etc.
  - `[model setting]`: specific setting for some models, e.g. `base`, `p16`, `w877`, etc.
- `{module info}`:
  - `[pretained info]`: pretrained information, e.g. `kinetics400-pretrained`, `in1k-pre`, etc.
  - `{backbone}`: backbone type. e.g. `r50` (ResNet-50), etc.
  - `[backbone setting]`: specific setting for some backbones, e.g. `nl-dot-product`, `bnfrozen`, `nopool`, etc.
- `{training info}`:
  - `{gpu x batch_per_gpu]}`: GPUs and samples per GPU.
  - `{pipeline setting}`: frame sample setting, e.g. `dense`, `{clip_len}x{frame_interval}x{num_clips}`, `u48`, etc.
  - `{schedule}`: training schedule, e.g. `coslr-20e`.
- `{data info}`:
  - `{dataset}`: dataset name, e.g. `kinetics400`, `mmit`, etc.
  - `{modality}`: data modality, e.g. `rgb`, `flow`, `keypoint-2d`, etc.

### Config System for Action Recognition

We incorporate modular design into our config system,
which is convenient to conduct various experiments.

- An Example of TSN

  To help the users have a basic idea of a complete config structure and the modules in an action recognition system,
  we make brief comments on the config of TSN as the following.
  For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.

  ```python

  # model settings

  model = dict(  # Config of the model

      type='Recognizer2D',  # Class name of the recognizer

      backbone=dict(  # Dict for backbone

          type='ResNet',  # Name of the backbone

          pretrained='torchvision://resnet50',  # The url/site of the pretrained model

          depth=50,  # Depth of ResNet model

          norm_eval=False),  # Whether to set BN layers to eval mode when training

      cls_head=dict(  # Dict for classification head

          type='TSNHead',  # Name of classification head

          num_classes=400,  # Number of classes to be classified.

          in_channels=2048,  # The input channels of classification head.

          spatial_type='avg',  # Type of pooling in spatial dimension

          consensus=dict(type='AvgConsensus', dim=1),  # Config of consensus module

          dropout_ratio=0.4,  # Probability in dropout layer

          init_std=0.01, # Std value for linear layer initiation

          average_clips='prob'),  # Method to average multiple clip results

      data_preprocessor=dict(  # Dict for data preprocessor

          type='ActionDataPreprocessor',  # Name of data preprocessor

          mean=[123.675, 116.28, 103.53],  # Mean values of different channels to normalize

          std=[58.395, 57.12, 57.375],  # Std values of different channels to normalize

          format_shape='NCHW'),  # Final image shape format

      # model training and testing settings

      train_cfg=None,  # Config of training hyperparameters for TSN

      test_cfg=None)  # Config for testing hyperparameters for TSN.



  # dataset settings

  dataset_type = 'RawframeDataset'  # Type of dataset for training, validation and testing

  data_root = 'data/kinetics400/rawframes_train/'  # Root path to data for training

  data_root_val = 'data/kinetics400/rawframes_val/'  # Root path to data for validation and testing

  ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt'  # Path to the annotation file for training

  ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt'  # Path to the annotation file for validation

  ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt'  # Path to the annotation file for testing



  train_pipeline = [  # Training data processing pipeline

      dict(  # Config of SampleFrames

          type='SampleFrames',  # Sample frames pipeline, sampling frames from video

          clip_len=1,  # Frames of each sampled output clip

          frame_interval=1,  # Temporal interval of adjacent sampled frames

          num_clips=3),  # Number of clips to be sampled

      dict(  # Config of RawFrameDecode

          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices

      dict(  # Config of Resize

          type='Resize',  # Resize pipeline

          scale=(-1, 256)),  # The scale to resize images

      dict(  # Config of MultiScaleCrop

          type='MultiScaleCrop',  # Multi scale crop pipeline, cropping images with a list of randomly selected scales

          input_size=224,  # Input size of the network

          scales=(1, 0.875, 0.75, 0.66),  # Scales of width and height to be selected

          random_crop=False,  # Whether to randomly sample cropping bbox

          max_wh_scale_gap=1),  # Maximum gap of w and h scale levels

      dict(  # Config of Resize

          type='Resize',  # Resize pipeline

          scale=(224, 224),  # The scale to resize images

          keep_ratio=False),  # Whether to resize with changing the aspect ratio

      dict(  # Config of Flip

          type='Flip',  # Flip Pipeline

          flip_ratio=0.5),  # Probability of implementing flip

      dict(  # Config of FormatShape

          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format

          input_format='NCHW'),  # Final image shape format

      dict(type='PackActionInputs')  # Config of PackActionInputs

  ]

  val_pipeline = [  # Validation data processing pipeline

      dict(  # Config of SampleFrames

          type='SampleFrames',  # Sample frames pipeline, sampling frames from video

          clip_len=1,  # Frames of each sampled output clip

          frame_interval=1,  # Temporal interval of adjacent sampled frames

          num_clips=3,  # Number of clips to be sampled

          test_mode=True),  # Whether to set test mode in sampling

      dict(  # Config of RawFrameDecode

          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices

      dict(  # Config of Resize

          type='Resize',  # Resize pipeline

          scale=(-1, 256)),  # The scale to resize images

      dict(  # Config of CenterCrop

          type='CenterCrop',  # Center crop pipeline, cropping the center area from images

          crop_size=224),  # The size to crop images

      dict(  # Config of Flip

          type='Flip',  # Flip pipeline

          flip_ratio=0),  # Probability of implementing flip

      dict(  # Config of FormatShape

          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format

          input_format='NCHW'),  # Final image shape format

      dict(type='PackActionInputs')  # Config of PackActionInputs

  ]

  test_pipeline = [  # Testing data processing pipeline

      dict(  # Config of SampleFrames

          type='SampleFrames',  # Sample frames pipeline, sampling frames from video

          clip_len=1,  # Frames of each sampled output clip

          frame_interval=1,  # Temporal interval of adjacent sampled frames

          num_clips=25,  # Number of clips to be sampled

          test_mode=True),  # Whether to set test mode in sampling

      dict(  # Config of RawFrameDecode

          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices

      dict(  # Config of Resize

          type='Resize',  # Resize pipeline

          scale=(-1, 256)),  # The scale to resize images

      dict(  # Config of TenCrop

          type='TenCrop',  # Ten crop pipeline, cropping ten area from images

          crop_size=224),  # The size to crop images

      dict(  # Config of Flip

          type='Flip',  # Flip pipeline

          flip_ratio=0),  # Probability of implementing flip

      dict(  # Config of FormatShape

          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format

          input_format='NCHW'),  # Final image shape format

      dict(type='PackActionInputs')  # Config of PackActionInputs

  ]



  train_dataloader = dict(  # Config of train dataloader

      batch_size=32,  # Batch size of each single GPU during training

      num_workers=8,  # Workers to pre-fetch data for each single GPU during training

      persistent_workers=True,  # If `True`, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed

      sampler=dict(

          type='DefaultSampler',  # DefaultSampler which supports both distributed and non-distributed training. Refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/dataset/sampler.py

          shuffle=True),  # Randomly shuffle the training data in each epoch

      dataset=dict(  # Config of train dataset

          type=dataset_type,

          ann_file=ann_file_train,  # Path of annotation file

          data_prefix=dict(img=data_root),  # Prefix of frame path

          pipeline=train_pipeline))

  val_dataloader = dict(  # Config of validation dataloader

      batch_size=1,  # Batch size of each single GPU during validation

      num_workers=8,  # Workers to pre-fetch data for each single GPU during validation

      persistent_workers=True,  # If `True`, the dataloader will not shut down the worker processes after an epoch end

      sampler=dict(

          type='DefaultSampler',

          shuffle=False),  # Not shuffle during validation and testing

      dataset=dict(  # Config of validation dataset

          type=dataset_type,

          ann_file=ann_file_val,  # Path of annotation file

          data_prefix=dict(img=data_root_val),  # Prefix of frame path

          pipeline=val_pipeline,

          test_mode=True))

  test_dataloader = dict(  # Config of test dataloader

      batch_size=32,  # Batch size of each single GPU during testing

      num_workers=8,  # Workers to pre-fetch data for each single GPU during testing

      persistent_workers=True,  # If `True`, the dataloader will not shut down the worker processes after an epoch end

      sampler=dict(

          type='DefaultSampler',

          shuffle=False),  # Not shuffle during validation and testing

      dataset=dict(  # Config of test dataset

          type=dataset_type,

          ann_file=ann_file_val,  # Path of annotation file

          data_prefix=dict(img=data_root_val),  # Prefix of frame path

          pipeline=test_pipeline,

          test_mode=True))



  # evaluation settings

  val_evaluator = dict(type='AccMetric')  # Config of validation evaluator

  test_evaluator = val_evaluator  # Config of testing evaluator



  train_cfg = dict(  # Config of training loop

      type='EpochBasedTrainLoop',  # Name of training loop

      max_epochs=100,  # Total training epochs

      val_begin=1,  # The epoch that begins validating

      val_interval=1)  # Validation interval

  val_cfg = dict(  # Config of validation loop

      type='ValLoop')  # Name of validation loop

  test_cfg = dict( # Config of testing loop

      type='TestLoop')  # Name of testing loop



  # learning policy

  param_scheduler = [  # Parameter scheduler for updating optimizer parameters, support dict or list

      dict(type='MultiStepLR',  # Decays the learning rate once the number of epoch reaches one of the milestones

          begin=0,  # Step at which to start updating the learning rate

          end=100,  # Step at which to stop updating the learning rate

          by_epoch=True,  # Whether the scheduled learning rate is updated by epochs

          milestones=[40, 80],  # Steps to decay the learning rate

          gamma=0.1)]  # Multiplicative factor of learning rate decay



  # optimizer

  optim_wrapper = dict(  # Config of optimizer wrapper

      type='OptimWrapper',  # Name of optimizer wrapper, switch to AmpOptimWrapper to enable mixed precision training

      optimizer=dict(  # Config of optimizer. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms

          type='SGD',  # Name of optimizer

          lr=0.01,  # Learning rate

          momentum=0.9,  # Momentum factor

          weight_decay=0.0001),  # Weight decay

      clip_grad=dict(max_norm=40, norm_type=2))  # Config of gradient clip



  # runtime settings

  default_scope = 'mmaction'  # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/tutorials/registry.html

  default_hooks = dict(  # Hooks to execute default actions like updating model parameters and saving checkpoints.

      runtime_info=dict(type='RuntimeInfoHook'),  # The hook to updates runtime information into message hub

      timer=dict(type='IterTimerHook'),  # The logger used to record time spent during iteration

      logger=dict(

          type='LoggerHook',  # The logger used to record logs during training/validation/testing phase

          interval=20,  # Interval to print the log

          ignore_last=False), # Ignore the log of last iterations in each epoch

      param_scheduler=dict(type='ParamSchedulerHook'),  # The hook to update some hyper-parameters in optimizer

      checkpoint=dict(

          type='CheckpointHook',  # The hook to save checkpoints periodically

          interval=3,  # The saving period

          save_best='auto',  # Specified metric to mearsure the best checkpoint during evaluation

          max_keep_ckpts=3),  # The maximum checkpoints to keep

      sampler_seed=dict(type='DistSamplerSeedHook'),  # Data-loading sampler for distributed training

      sync_buffers=dict(type='SyncBuffersHook'))  # Synchronize model buffers at the end of each epoch

  env_cfg = dict(  # Dict for setting environment

      cudnn_benchmark=False,  # Whether to enable cudnn benchmark

      mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), # Parameters to setup multiprocessing

      dist_cfg=dict(backend='nccl')) # Parameters to setup distributed environment, the port can also be set



  log_processor = dict(

      type='LogProcessor',  # Log processor used to format log information

      window_size=20,  # Default smooth interval

      by_epoch=True)  # Whether to format logs with epoch type

  vis_backends = [  # List of visualization backends

      dict(type='LocalVisBackend')]  # Local visualization backend

  visualizer = dict(  # Config of visualizer

      type='ActionVisualizer',  # Name of visualizer

      vis_backends=vis_backends)

  log_level = 'INFO'  # The level of logging

  load_from = None  # Load model checkpoint as a pre-trained model from a given path. This will not resume training.

  resume = False  # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.

  ```

### Config System for Spatio-Temporal Action Detection

We incorporate modular design into our config system, which is convenient to conduct various experiments.

- An Example of FastRCNN

  To help the users have a basic idea of a complete config structure and the modules in a spatio-temporal action detection system,
  we make brief comments on the config of FastRCNN as the following.
  For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.

  ```python

  # model setting

  model = dict(  # Config of the model

      type='FastRCNN',  # Class name of the detector

      _scope_='mmdet',  # The scope of current config

      backbone=dict(  # Dict for backbone

          type='ResNet3dSlowOnly',  # Name of the backbone

          depth=50, # Depth of ResNet model

          pretrained=None,   # The url/site of the pretrained model

          pretrained2d=False, # If the pretrained model is 2D

          lateral=False,  # If the backbone is with lateral connections

          num_stages=4, # Stages of ResNet model

          conv1_kernel=(1, 7, 7), # Conv1 kernel size

          conv1_stride_t=1, # Conv1 temporal stride

          pool1_stride_t=1, # Pool1 temporal stride

          spatial_strides=(1, 2, 2, 1)),  # The spatial stride for each ResNet stage

      roi_head=dict(  # Dict for roi_head

          type='AVARoIHead',  # Name of the roi_head

          bbox_roi_extractor=dict(  # Dict for bbox_roi_extractor

              type='SingleRoIExtractor3D',  # Name of the bbox_roi_extractor

              roi_layer_type='RoIAlign',  # Type of the RoI op

              output_size=8,  # Output feature size of the RoI op

              with_temporal_pool=True), # If temporal dim is pooled

          bbox_head=dict( # Dict for bbox_head

              type='BBoxHeadAVA', # Name of the bbox_head

              in_channels=2048, # Number of channels of the input feature

              num_classes=81, # Number of action classes + 1

              multilabel=True,  # If the dataset is multilabel

              dropout_ratio=0.5),  # The dropout ratio used

      data_preprocessor=dict(  # Dict for data preprocessor

          type='ActionDataPreprocessor',  # Name of data preprocessor

          mean=[123.675, 116.28, 103.53],  # Mean values of different channels to normalize

          std=[58.395, 57.12, 57.375],  # Std values of different channels to normalize

          format_shape='NCHW')),  # Final image shape format

      # model training and testing settings

      train_cfg=dict(  # Training config of FastRCNN

          rcnn=dict(  # Dict for rcnn training config

              assigner=dict(  # Dict for assigner

                  type='MaxIoUAssignerAVA', # Name of the assigner

                  pos_iou_thr=0.9,  # IoU threshold for positive examples, > pos_iou_thr -> positive

                  neg_iou_thr=0.9,  # IoU threshold for negative examples, < neg_iou_thr -> negative

                  min_pos_iou=0.9), # Minimum acceptable IoU for positive examples

              sampler=dict( # Dict for sample

                  type='RandomSampler', # Name of the sampler

                  num=32, # Batch Size of the sampler

                  pos_fraction=1, # Positive bbox fraction of the sampler

                  neg_pos_ub=-1,  # Upper bound of the ratio of num negative to num positive

                  add_gt_as_proposals=True), # Add gt bboxes as proposals

              pos_weight=1.0)),  # Loss weight of positive examples

      test_cfg=dict(rcnn=None))  # Testing config of FastRCNN



  # dataset settings

  dataset_type = 'AVADataset' # Type of dataset for training, validation and testing

  data_root = 'data/ava/rawframes'  # Root path to data

  anno_root = 'data/ava/annotations'  # Root path to annotations



  ann_file_train = f'{anno_root}/ava_train_v2.1.csv'  # Path to the annotation file for training

  ann_file_val = f'{anno_root}/ava_val_v2.1.csv'  # Path to the annotation file for validation



  exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv'  # Path to the exclude annotation file for training

  exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv'  # Path to the exclude annotation file for validation



  label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt'  # Path to the label file



  proposal_file_train = f'{anno_root}/ava_dense_proposals_train.FAIR.recall_93.9.pkl'  # Path to the human detection proposals for training examples

  proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl'  # Path to the human detection proposals for validation examples



  train_pipeline = [  # Training data processing pipeline

      dict(  # Config of SampleFrames

          type='AVASampleFrames',  # Sample frames pipeline, sampling frames from video

          clip_len=4,  # Frames of each sampled output clip

          frame_interval=16),  # Temporal interval of adjacent sampled frames

      dict(  # Config of RawFrameDecode

          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices

      dict(  # Config of RandomRescale

          type='RandomRescale',   # Randomly rescale the shortedge by a given range

          scale_range=(256, 320)),   # The shortedge size range of RandomRescale

      dict(  # Config of RandomCrop

          type='RandomCrop',   # Randomly crop a patch with the given size

          size=256),   # The size of the cropped patch

      dict(  # Config of Flip

          type='Flip',  # Flip Pipeline

          flip_ratio=0.5),  # Probability of implementing flip

      dict(  # Config of FormatShape

          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format

          input_format='NCTHW',  # Final image shape format

          collapse=True),   # Collapse the dim N if N == 1

      dict(type='PackActionInputs') # Pack input data

  ]



  val_pipeline = [  # Validation data processing pipeline

      dict(  # Config of SampleFrames

          type='AVASampleFrames',  # Sample frames pipeline, sampling frames from video

          clip_len=4,  # Frames of each sampled output clip

          frame_interval=16),  # Temporal interval of adjacent sampled frames

      dict(  # Config of RawFrameDecode

          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices

      dict(  # Config of Resize

          type='Resize',  # Resize pipeline

          scale=(-1, 256)),  # The scale to resize images

      dict(  # Config of FormatShape

          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format

          input_format='NCTHW',  # Final image shape format

          collapse=True),   # Collapse the dim N if N == 1

      dict(type='PackActionInputs') # Pack input data

  ]



  train_dataloader = dict(  # Config of train dataloader

      batch_size=32,  # Batch size of each single GPU during training

      num_workers=8,  # Workers to pre-fetch data for each single GPU during training

      persistent_workers=True,  # If `True`, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed

      sampler=dict(

          type='DefaultSampler',  # DefaultSampler which supports both distributed and non-distributed training. Refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/dataset/sampler.py

          shuffle=True),  # Randomly shuffle the training data in each epoch

      dataset=dict(  # Config of train dataset

          type=dataset_type,

          ann_file=ann_file_train,  # Path of annotation file

          exclude_file=exclude_file_train,  # Path of exclude annotation file

          label_file=label_file,  # Path of label file

          data_prefix=dict(img=data_root),  # Prefix of frame path

          proposal_file=proposal_file_train,  # Path of human detection proposals

          pipeline=train_pipeline))

  val_dataloader = dict(  # Config of validation dataloader

      batch_size=1,  # Batch size of each single GPU during evaluation

      num_workers=8,  # Workers to pre-fetch data for each single GPU during evaluation

      persistent_workers=True,  # If `True`, the dataloader will not shut down the worker processes after an epoch end

      sampler=dict(

          type='DefaultSampler',

          shuffle=False),  # Not shuffle during validation and testing

      dataset=dict(  # Config of validation dataset

          type=dataset_type,

          ann_file=ann_file_val,  # Path of annotation file

          exclude_file=exclude_file_val,  # Path of exclude annotation file

          label_file=label_file,  # Path of label file

          data_prefix=dict(img=data_root_val),  # Prefix of frame path

          proposal_file=proposal_file_val,  # Path of human detection proposals

          pipeline=val_pipeline,

          test_mode=True))

  test_dataloader = val_dataloader  # Config of testing dataloader



  # evaluation settings

  val_evaluator = dict(  # Config of validation evaluator

      type='AVAMetric',

      ann_file=ann_file_val,

      label_file=label_file,

      exclude_file=exclude_file_val)

  test_evaluator = val_evaluator  # Config of testing evaluator



  train_cfg = dict(  # Config of training loop

      type='EpochBasedTrainLoop',  # Name of training loop

      max_epochs=20,  # Total training epochs

      val_begin=1,  # The epoch that begins validating

      val_interval=1)  # Validation interval

  val_cfg = dict(  # Config of validation loop

      type='ValLoop')  # Name of validation loop

  test_cfg = dict( # Config of testing loop

      type='TestLoop')  # Name of testing loop



  # learning policy

  param_scheduler = [ # Parameter scheduler for updating optimizer parameters, support dict or list

      dict(type='LinearLR',  # Decays the learning rate of each parameter group by linearly changing small multiplicative factor

          start_factor=0.1,  # The number we multiply learning rate in the first epoch

          by_epoch=True,  # Whether the scheduled learning rate is updated by epochs

    	  begin=0,  # Step at which to start updating the learning rate

    	  end=5),  # Step at which to stop updating the learning rate

      dict(type='MultiStepLR',  # Decays the learning rate once the number of epoch reaches one of the milestones

          begin=0,  # Step at which to start updating the learning rate

          end=20,  # Step at which to stop updating the learning rate

          by_epoch=True,  # Whether the scheduled learning rate is updated by epochs

          milestones=[10, 15],  # Steps to decay the learning rate

          gamma=0.1)]  # Multiplicative factor of learning rate decay



  # optimizer

  optim_wrapper = dict(  # Config of optimizer wrapper

      type='OptimWrapper',  # Name of optimizer wrapper, switch to AmpOptimWrapper to enable mixed precision training

      optimizer=dict(  # Config of optimizer. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms

          type='SGD',  # Name of optimizer

          lr=0.2,  # Learning rate

          momentum=0.9,  # Momentum factor

          weight_decay=0.0001),  # Weight decay

      clip_grad=dict(max_norm=40, norm_type=2))  # Config of gradient clip



  # runtime settings

  default_scope = 'mmaction'  # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/tutorials/registry.html

  default_hooks = dict(  # Hooks to execute default actions like updating model parameters and saving checkpoints.

      runtime_info=dict(type='RuntimeInfoHook'),  # The hook to updates runtime information into message hub

      timer=dict(type='IterTimerHook'),  # The logger used to record time spent during iteration

      logger=dict(

          type='LoggerHook',  # The logger used to record logs during training/validation/testing phase

          interval=20,  # Interval to print the log

          ignore_last=False), # Ignore the log of last iterations in each epoch

      param_scheduler=dict(type='ParamSchedulerHook'),  # The hook to update some hyper-parameters in optimizer

      checkpoint=dict(

          type='CheckpointHook',  # The hook to save checkpoints periodically

          interval=3,  # The saving period

          save_best='auto',  # Specified metric to mearsure the best checkpoint during evaluation

          max_keep_ckpts=3),  # The maximum checkpoints to keep

      sampler_seed=dict(type='DistSamplerSeedHook'),  # Data-loading sampler for distributed training

      sync_buffers=dict(type='SyncBuffersHook'))  # Synchronize model buffers at the end of each epoch

  env_cfg = dict(  # Dict for setting environment

      cudnn_benchmark=False,  # Whether to enable cudnn benchmark

      mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), # Parameters to setup multiprocessing

      dist_cfg=dict(backend='nccl')) # Parameters to setup distributed environment, the port can also be set



  log_processor = dict(

      type='LogProcessor',  # Log processor used to format log information

      window_size=20,  # Default smooth interval

      by_epoch=True)  # Whether to format logs with epoch type

  vis_backends = [  # List of visualization backends

      dict(type='LocalVisBackend')]  # Local visualization backend

  visualizer = dict(  # Config of visualizer

      type='ActionVisualizer',  # Name of visualizer

      vis_backends=vis_backends)

  log_level = 'INFO'  # The level of logging

  load_from = ('https://download.openmmlab.com/mmaction/v1.0/recognition/slowonly/'

               'slowonly_imagenet-pretrained-r50_8xb16-4x16x1-steplr-150e_kinetics400-rgb/'

               'slowonly_imagenet-pretrained-r50_8xb16-4x16x1-steplr-150e_kinetics400-rgb_20220901-e7b65fad.pth')  # Load model checkpoint as a pre-trained model from a given path. This will not resume training.

  resume = False  # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.

  ```

### Config System for Action localization

We incorporate modular design into our config system,
which is convenient to conduct various experiments.

- An Example of BMN

  To help the users have a basic idea of a complete config structure and the modules in an action localization system,
  we make brief comments on the config of BMN as the following.
  For more detailed usage and alternative for per parameter in each module, please refer to the [API documentation](https://mmaction2.readthedocs.io/en/latest/api.html).

  ```python

  # model settings

  model = dict(  # Config of the model

      type='BMN',  # Class name of the localizer

      temporal_dim=100,  # Total frames selected for each video

      boundary_ratio=0.5,  # Ratio for determining video boundaries

      num_samples=32,  # Number of samples for each proposal

      num_samples_per_bin=3,  # Number of bin samples for each sample

      feat_dim=400,  # Dimension of feature

      soft_nms_alpha=0.4,  # Soft NMS alpha

      soft_nms_low_threshold=0.5,  # Soft NMS low threshold

      soft_nms_high_threshold=0.9,  # Soft NMS high threshold

      post_process_top_k=100)  # Top k proposals in post process



  # dataset settings

  dataset_type = 'ActivityNetDataset'  # Type of dataset for training, validation and testing

  data_root = 'data/activitynet_feature_cuhk/csv_mean_100/'  # Root path to data for training

  data_root_val = 'data/activitynet_feature_cuhk/csv_mean_100/'  # Root path to data for validation and testing

  ann_file_train = 'data/ActivityNet/anet_anno_train.json'  # Path to the annotation file for training

  ann_file_val = 'data/ActivityNet/anet_anno_val.json'  # Path to the annotation file for validation

  ann_file_test = 'data/ActivityNet/anet_anno_test.json'  # Path to the annotation file for testing



  train_pipeline = [  # Training data processing pipeline

      dict(type='LoadLocalizationFeature'),  # Load localization feature pipeline

      dict(type='GenerateLocalizationLabels'),  # Generate localization labels pipeline

      dict(

          type='PackLocalizationInputs', # Pack localization data

          keys=('gt_bbox'), # Keys of input

          meta_keys=('video_name'))] # Meta keys of input

  val_pipeline = [  # Validation data processing pipeline

      dict(type='LoadLocalizationFeature'),  # Load localization feature pipeline

      dict(type='GenerateLocalizationLabels'),  # Generate localization labels pipeline

      dict(

          type='PackLocalizationInputs',  # Pack localization data

          keys=('gt_bbox'),   # Keys of input

          meta_keys=('video_name', 'duration_second', 'duration_frame',

                     'annotations', 'feature_frame'))]  # Meta keys of input

  test_pipeline = [  # Testing data processing pipeline

      dict(type='LoadLocalizationFeature'),  # Load localization feature pipeline

      dict(

          type='PackLocalizationInputs',  # Pack localization data

          keys=('gt_bbox'),  # Keys of input

          meta_keys=('video_name', 'duration_second', 'duration_frame',

                     'annotations', 'feature_frame'))]  # Meta keys of input

  train_dataloader = dict(  # Config of train dataloader

      batch_size=8,  # Batch size of each single GPU during training

      num_workers=8,  # Workers to pre-fetch data for each single GPU during training

      persistent_workers=True,  # If `True`, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed

      sampler=dict(

          type='DefaultSampler',  # DefaultSampler which supports both distributed and non-distributed training. Refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/dataset/sampler.py

          shuffle=True),  # Randomly shuffle the training data in each epoch

      dataset=dict(  # Config of train dataset

          type=dataset_type,

          ann_file=ann_file_train,  # Path of annotation file

          data_prefix=dict(video=data_root),  # Prefix of video path

          pipeline=train_pipeline))

  val_dataloader = dict(  # Config of validation dataloader

      batch_size=1,  # Batch size of each single GPU during evaluation

      num_workers=8,  # Workers to pre-fetch data for each single GPU during evaluation

      persistent_workers=True,  # If `True`, the dataloader will not shut down the worker processes after an epoch end

      sampler=dict(

          type='DefaultSampler',

          shuffle=False),  # Not shuffle during validation and testing

      dataset=dict(  # Config of validation dataset

          type=dataset_type,

          ann_file=ann_file_val,  # Path of annotation file

          data_prefix=dict(video=data_root_val),  # Prefix of video path

          pipeline=val_pipeline,

          test_mode=True))

  test_dataloader = dict(  # Config of test dataloader

      batch_size=1,  # Batch size of each single GPU during testing

      num_workers=8,  # Workers to pre-fetch data for each single GPU during testing

      persistent_workers=True,  # If `True`, the dataloader will not shut down the worker processes after an epoch end

      sampler=dict(

          type='DefaultSampler',

          shuffle=False),  # Not shuffle during validation and testing

      dataset=dict(  # Config of test dataset

          type=dataset_type,

          ann_file=ann_file_val,  # Path of annotation file

          data_prefix=dict(video=data_root_val),  # Prefix of video path

          pipeline=test_pipeline,

          test_mode=True))



  # evaluation settings

  work_dir = './work_dirs/bmn_400x100_2x8_9e_activitynet_feature/'  # Directory to save the model checkpoints and logs for the current experiments

  val_evaluator = dict(

    type='ANetMetric',

    metric_type='AR@AN',

    dump_config=dict(  # Config of localization output

        out=f'{work_dir}/results.json',  # Path to the output file

        output_format='json'))  # File format of the output file

  test_evaluator = val_evaluator   # Set test_evaluator as val_evaluator



  max_epochs = 9  # Total epochs to train the model

  train_cfg = dict(  # Config of training loop

      type='EpochBasedTrainLoop',  # Name of training loop

      max_epochs=max_epochs,  # Total training epochs

      val_begin=1,  # The epoch that begins validating

      val_interval=1)  # Validation interval

  val_cfg = dict(  # Config of validation loop

      type='ValLoop')  # Name of validating loop

  test_cfg = dict( # Config of testing loop

      type='TestLoop')  # Name of testing loop



  # learning policy

  param_scheduler = [  # Parameter scheduler for updating optimizer parameters, support dict or list

      dict(type='MultiStepLR',  # Decays the learning rate once the number of epoch reaches one of the milestones

      begin=0,  # Step at which to start updating the learning rate

      end=max_epochs,  # Step at which to stop updating the learning rate

      by_epoch=True,  # Whether the scheduled learning rate is updated by epochs

      milestones=[7, ],  # Steps to decay the learning rate

      gamma=0.1)]  # Multiplicative factor of parameter value decay



  # optimizer

  optim_wrapper = dict(  # Config of optimizer wrapper

      type='OptimWrapper',  # Name of optimizer wrapper, switch to AmpOptimWrapper to enable mixed precision training

      optimizer=dict(  # Config of optimizer. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms

          type='Adam',  # Name of optimizer

          lr=0.001,  # Learning rate

          weight_decay=0.0001),  # Weight decay

      clip_grad=dict(max_norm=40, norm_type=2))  # Config of gradient clip



  # runtime settings

  default_scope = 'mmaction'  # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/tutorials/registry.html

  default_hooks = dict(  # Hooks to execute default actions like updating model parameters and saving checkpoints.

      runtime_info=dict(type='RuntimeInfoHook'),  # The hook to updates runtime information into message hub

      timer=dict(type='IterTimerHook'),  # The logger used to record time spent during iteration

      logger=dict(

          type='LoggerHook',  # The logger used to record logs during training/validation/testing phase

          interval=20,  # Interval to print the log

          ignore_last=False), # Ignore the log of last iterations in each epoch

      param_scheduler=dict(type='ParamSchedulerHook'),  # The hook to update some hyper-parameters in optimizer

      checkpoint=dict(

          type='CheckpointHook',  # The hook to save checkpoints periodically

          interval=3,  # The saving period

          save_best='auto',  # Specified metric to mearsure the best checkpoint during evaluation

          max_keep_ckpts=3),  # The maximum checkpoints to keep

      sampler_seed=dict(type='DistSamplerSeedHook'),  # Data-loading sampler for distributed training

      sync_buffers=dict(type='SyncBuffersHook'))  # Synchronize model buffers at the end of each epoch

  env_cfg = dict(  # Dict for setting environment

      cudnn_benchmark=False,  # Whether to enable cudnn benchmark

      mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), # Parameters to setup multiprocessing

      dist_cfg=dict(backend='nccl')) # Parameters to setup distributed environment, the port can also be set



  log_processor = dict(

      type='LogProcessor',  # Log processor used to format log information

      window_size=20,  # Default smooth interval

      by_epoch=True)  # Whether to format logs with epoch type

  vis_backends = [  # List of visualization backends

      dict(type='LocalVisBackend')]  # Local visualization backend

  visualizer = dict(  # Config of visualizer

      type='ActionVisualizer',  # Name of visualizer

      vis_backends=vis_backends)

  log_level = 'INFO'  # The level of logging

  load_from = None  # Load model checkpoint as a pre-trained model from a given path. This will not resume training.

  resume = False  # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.

  ```