Introduction to Kinesics Recognition Framework
Humans utilize a range of channels to deliver their thought processes, including verbal messages, facial expression, body language, and etc. Among these channels, body language plays a critical role since it conveys unspoken cues about an individual's mental and emotional states through action movements. To study the meanings and interpretations expressed through bodily movements, psychologists Ekman and Friesen developed the taxonomy of kinesics that classifies bodily movements into five categories based on their communication functions: emblems, illustrators, affect displays, adaptors, and regulators. This principled taxonomy defines clear linkage between human activities and their respective meanings and communicative categories. In conjunction with human activity recognition (HAR)---which use sensor data such as RGB video, depth maps, or 3D skeletal keypoint to identify actions---the opportunity to automatically recognize the kinesics of human movements arises. This opportunity entails compiling a dictionary-based mapping that corresponds human activities to their communicative categories. The dictionary is then appended to an HAR algorithm to determine the kinesic function of a given activity after its recognized. However, the sheer variety of human actions makes it infeasible to manually define mappings for every possible movement. To truely decode human reasoning through bodily movements, we must move beyond dictionary-based mapping and towards methods capable of learning a generalized translation between physical actions and their cognitive and affective significance.
In this repository, we present a framework that classifies the kinesic categories of human movements. To be more specific, the framework leverages a structured pattern embedded in skeletal keypoint data that clusters different human activities with same communicative purpose together. The framework extracts the structure through a transfer learning model that consists of a Spatial-Temporal Graph Convolutional Network (ST-GCN) and a convolutional neural network (CNN). This model is implemented on a HAR dataset that is derived from the taxonomy of Kinesics --- the Dyadic User EngagemenT dataset (DUET) to demonstrate its efficacy.
Kinesics Recognition Framework Pipeline
The code in this repository runs the framework on 30 subsets of DUET, and each subset contains different numbers and types of activities. To implement the framework, please follow the steps below:
- Duplicate the folder structure of the repository.
- Download the 3D joints from the DUET repository and store
all the folders (e.g., CC0101 and CL0102) in the
datadirectory. - Download all the required packages in
requirement.txt. - Run
python disposition.py. - After the completion of the code, the results will be stored in
experiment_results.pkl, which include the experiment number, the number of interactions, the type of interaction, the accuracy of the kinesics recognition for the given interactions. To acquire the accuracy of ST-GCN of the experiment, go to the logging file of the corresponding experiment (i.e.,work_dirs/*experiment_N*/*execution_timestamps*/*execution_timestamps.txt*.) For instance, if you would like to determine the ST-GCN accuracy of experiment 3 that is executed at 11:07:50 on May 1, 2025, go towork_dirs/*experiment_3*/20250501_110750/20250501_110750.txt*.