Emotion Recognition from Speech (Deep Learning Model)

This model predicts emotions from short audio recordings using deep learning techniques. It processes audio features like MFCC, ZCR, and RMSE and returns a primary emotion prediction along with confidence scores for all detected emotions.

Model Details

Architecture: Custom CNN-based Keras model
Features Used:
- MFCC (Mel-Frequency Cepstral Coefficients)
- ZCR (Zero Crossing Rate)
- RMSE (Root Mean Square Energy)
Framework: TensorFlow / Keras
Trained On: Processed speech emotion datasets
Output:
- Primary emotion label
- Confidence scores for each emotion
Emotion Classes:
- happy, sad, angry, fear, neutral

Evaluation

The accuracy plot demonstrates a clear upward trend for both training and validation datasets over 50 epochs. Initially, the model showed rapid improvement, reaching over 90% accuracy by epoch 15. From epochs 20 to 50, both curves stabilize above 95%, indicating consistent learning with no significant overfitting. By the final epoch, training accuracy approaches 0.99, and validation accuracy mirrors this trend closely, demonstrating excellent generalization capability.

More Details

Visit: https://documentation-fyp.vercel.app/

Downloads last month: -; Downloads are not tracked for this model. How to track