Emotion Recognition from Speech (Deep Learning Model)

This model predicts emotions from short audio recordings using deep learning techniques. It processes audio features like MFCC, ZCR, and RMSE and returns a primary emotion prediction along with confidence scores for all detected emotions.

Model Details

  • Architecture: Custom CNN-based Keras model
  • Features Used:
    • MFCC (Mel-Frequency Cepstral Coefficients)
    • ZCR (Zero Crossing Rate)
    • RMSE (Root Mean Square Energy)
  • Framework: TensorFlow / Keras
  • Trained On: Processed speech emotion datasets
  • Output:
    • Primary emotion label
    • Confidence scores for each emotion
  • Emotion Classes:
    • happy, sad, angry, fear, neutral

Evaluation

image/png

The accuracy plot demonstrates a clear upward trend for both training and validation datasets over 50 epochs. Initially, the model showed rapid improvement, reaching over 90% accuracy by epoch 15. From epochs 20 to 50, both curves stabilize above 95%, indicating consistent learning with no significant overfitting. By the final epoch, training accuracy approaches 0.99, and validation accuracy mirrors this trend closely, demonstrating excellent generalization capability.

More Details

Visit: https://documentation-fyp.vercel.app/

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support