metadata
language: en
tags:
- audio
- speech-emotion-recognition
- pytorch
- cnn-bilstm
datasets:
- ravdess
- tess
metrics:
- accuracy
Speech Emotion Recognition (CNN-BiLSTM-Attention)
This model was trained from scratch on the RAVDESS and TESS datasets.
Model Architecture
- Front-end: 4-block CNN for feature extraction from Mel Spectrograms.
- Mid-section: Bidirectional LSTM for temporal dependencies.
- Pooling: Multi-head Attention pooling.
- Back-end: Fully connected classifier.
Classes
0: neutral, 1: calm, 2: happy, 3: sad, 4: angry, 5: fearful, 6: disgust