--- language: en tags: - audio - speech-emotion-recognition - pytorch - cnn-bilstm datasets: - ravdess - tess metrics: - accuracy --- # Speech Emotion Recognition (CNN-BiLSTM-Attention) This model was trained from scratch on the RAVDESS and TESS datasets. ## Model Architecture - **Front-end**: 4-block CNN for feature extraction from Mel Spectrograms. - **Mid-section**: Bidirectional LSTM for temporal dependencies. - **Pooling**: Multi-head Attention pooling. - **Back-end**: Fully connected classifier. ## Classes 0: neutral, 1: calm, 2: happy, 3: sad, 4: angry, 5: fearful, 6: disgust