ser-fast-cnn-bilstm / README.md
MathRaaj's picture
Upload README.md with huggingface_hub
5eb5832 verified
metadata
language: en
tags:
  - audio
  - speech-emotion-recognition
  - pytorch
  - cnn-bilstm
datasets:
  - ravdess
  - tess
metrics:
  - accuracy

Speech Emotion Recognition (CNN-BiLSTM-Attention)

This model was trained from scratch on the RAVDESS and TESS datasets.

Model Architecture

  • Front-end: 4-block CNN for feature extraction from Mel Spectrograms.
  • Mid-section: Bidirectional LSTM for temporal dependencies.
  • Pooling: Multi-head Attention pooling.
  • Back-end: Fully connected classifier.

Classes

0: neutral, 1: calm, 2: happy, 3: sad, 4: angry, 5: fearful, 6: disgust