| language: en | |
| tags: | |
| - audio | |
| - speech-emotion-recognition | |
| - pytorch | |
| - cnn-bilstm | |
| datasets: | |
| - ravdess | |
| - tess | |
| metrics: | |
| - accuracy | |
| # Speech Emotion Recognition (CNN-BiLSTM-Attention) | |
| This model was trained from scratch on the RAVDESS and TESS datasets. | |
| ## Model Architecture | |
| - **Front-end**: 4-block CNN for feature extraction from Mel Spectrograms. | |
| - **Mid-section**: Bidirectional LSTM for temporal dependencies. | |
| - **Pooling**: Multi-head Attention pooling. | |
| - **Back-end**: Fully connected classifier. | |
| ## Classes | |
| 0: neutral, 1: calm, 2: happy, 3: sad, 4: angry, 5: fearful, 6: disgust | |