saky-semicolon's picture
Update README.md
0569cdd verified
metadata
license: mit
language:
  - en
pipeline_tag: audio-classification

Emotion Recognition from Speech (Deep Learning Model)

This model predicts emotions from short audio recordings using deep learning techniques. It processes audio features like MFCC, ZCR, and RMSE and returns a primary emotion prediction along with confidence scores for all detected emotions.

Model Details

  • Architecture: Custom CNN-based Keras model
  • Features Used:
    • MFCC (Mel-Frequency Cepstral Coefficients)
    • ZCR (Zero Crossing Rate)
    • RMSE (Root Mean Square Energy)
  • Framework: TensorFlow / Keras
  • Trained On: Processed speech emotion datasets
  • Output:
    • Primary emotion label
    • Confidence scores for each emotion
  • Emotion Classes:
    • happy, sad, angry, fear, neutral

Evaluation

image/png

The accuracy plot demonstrates a clear upward trend for both training and validation datasets over 50 epochs. Initially, the model showed rapid improvement, reaching over 90% accuracy by epoch 15. From epochs 20 to 50, both curves stabilize above 95%, indicating consistent learning with no significant overfitting. By the final epoch, training accuracy approaches 0.99, and validation accuracy mirrors this trend closely, demonstrating excellent generalization capability.

More Details

Visit: https://documentation-fyp.vercel.app/