Emotion Recognition from Speech (Deep Learning Model)
This model predicts emotions from short audio recordings using deep learning techniques. It processes audio features like MFCC, ZCR, and RMSE and returns a primary emotion prediction along with confidence scores for all detected emotions.
Model Details
- Architecture: Custom CNN-based Keras model
- Features Used:
- MFCC (Mel-Frequency Cepstral Coefficients)
- ZCR (Zero Crossing Rate)
- RMSE (Root Mean Square Energy)
- Framework: TensorFlow / Keras
- Trained On: Processed speech emotion datasets
- Output:
- Primary emotion label
- Confidence scores for each emotion
- Emotion Classes:
happy,sad,angry,fear,neutral
Evaluation
The accuracy plot demonstrates a clear upward trend for both training and validation datasets over 50 epochs. Initially, the model showed rapid improvement, reaching over 90% accuracy by epoch 15. From epochs 20 to 50, both curves stabilize above 95%, indicating consistent learning with no significant overfitting. By the final epoch, training accuracy approaches 0.99, and validation accuracy mirrors this trend closely, demonstrating excellent generalization capability.
