TuBERT: Multimodal Speech Emotion Recognition For Real-Time Avatar Control
This project was developed for my senior thesis at Princeton University. Paper being published soon.
About
TuBERT is a multimodal speech emotion recognition model that runs in real-time and on-device. I designed it with PNGTubers in mind, but there are plenty of other applications for it as well!
To test the model for yourself using a GUI, see the GitHub repository for installation instructions.
Usage
tubert.pt is the base TuBERT model trained on MELD, described in the paper and used by default for evaluation. tubert_iemocap.pt is the version of the model fine-tuned on IEMOCAP.