MSG3D Project
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
Abstract
Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multi-scale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.
Usage
Setup Environment
Please refer to Installation to install MMAction2.
Assume that you are located at $MMACTION2/projects/msg3d.
Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.
Please run it every time after you opened a new shell.
export PYTHONPATH=`pwd`:$PYTHONPATH
Data Preparation
Prepare the NTU60 dataset according to the instruction.
Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.
ln -s ../../data ./data
Data Preparation
Prepare the NTU60 dataset according to the instruction.
Training commands
To train with single GPU:
mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py
To train with multiple GPUs:
mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher pytorch --gpus 8
To train with multiple GPUs by slurm:
mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher slurm \
--gpus 8 --gpus-per-node 8 --partition $PARTITION
Testing commands
To test with single GPU:
mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT
To test with multiple GPUs:
mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8
To test with multiple GPUs by slurm:
mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher slurm \
--gpus 8 --gpus-per-node 8 --partition $PARTITION
Results
NTU60_XSub_2D
| frame sampling strategy | modality | gpus | backbone | top1 acc | testing protocol | config | ckpt | log |
|---|---|---|---|---|---|---|---|---|
| uniform 100 | joint | 8 | MSG3D | 92.3 | 10 clips | config | ckpt | log |
NTU60_XSub_3D
| frame sampling strategy | modality | gpus | backbone | top1 acc | testing protocol | config | ckpt | log |
|---|---|---|---|---|---|---|---|---|
| uniform 100 | joint | 8 | MSG3D | 89.6 | 10 clips | config | ckpt | log |
Citation
@inproceedings{liu2020disentangling,
title={Disentangling and unifying graph convolutions for skeleton-based action recognition},
author={Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli},
booktitle={CVPR},
pages={143--152},
year={2020}
}