Multimodal Emotion Speech Recognition
Model Description
This model performs emotion recognition from speech using a multimodal approach, utilizing:
- Audio Model: Wav2Vec2 Base
- Text/Context Model: RoBERTa Base
Dataset
- Dataset Name: stapesai/ssi-speech-emotion-recognition
Evaluation Results
Classification Report
precision recall f1-score support
ANG 0.96 0.90 0.93 30
CAL 0.00 0.00 0.00 0
DIS 0.86 0.95 0.90 20
FEA 0.78 0.67 0.72 27
HAP 0.84 0.79 0.81 33
NEU 0.89 0.96 0.92 25
SAD 0.78 0.95 0.86 19
SUR 0.88 0.78 0.82 9
accuracy 0.85 163
macro avg 0.75 0.75 0.75 163
weighted avg 0.86 0.85 0.85 163
Overall Accuracy: 0.85
Model tree for dynann/multimodal-emotion-speech-recognition
Base model
FacebookAI/roberta-base