Vishal-Padia
/

SentimentSound

Model card Files Files and versions

xet

Community

Vishal-Padia commited on Dec 2, 2024

Commit

961ca22

verified ·

1 Parent(s): 998f276

Upload speech emotion recognition model

Browse files

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+# SentimentSound
+## Overview
+This is a deep learning model for Speech Emotion Recognition that can classify audio clips into different emotional states. The model is trained on a dataset of speech samples and can identify emotions such as neutral, calm, happy, sad, angry, fearful, disgust, and surprised.
+## Model Details
+- **Model Type:** Hybrid Neural Network (CNN + LSTM)
+- **Input:** Audio features extracted from 3-second wav files
+- **Output:** Emotion classification
+### Supported Emotions
+- Neutral
+- Calm
+- Happy
+- Sad
+- Angry
+- Fearful
+- Disgust
+- Surprised
+## Installation
+### Clone the Repository
+```bash
+git clone https://github.com/Vishal-Padia/SentimentSound.git
+```
+### Dependencies
+```bash
+pip install -r requirements.txt
+```
+### Usage Example
+```bash
+python emotion_predictor.py
+```
+## Model Performance
+- **Accuracy:** 85%
+- **Evaluation Metrics:** Confusion matrix below
+![Image](confusion_matrix.png)
+## Training Details
+- **Feature Extraction:**
+  - MFCC
+  - Spectral Centroid
+  - Chroma Features
+  - Spectral Contrast
+  - Zero Crossing Rate
+  - Spectral Rolloff
+- **Augmentation:** Random noise and scaling applied
+- **Training Techniques:**
+  - Class weighted loss
+  - AdamW optimizer
+  - Learning rate scheduling
+  - Gradient clipping
+## Limitations
+- Works best with clear speech recordings
+- Optimized for 3-second audio clips
+- Performance may vary with different audio sources
+## Acknowledgments
+- Dataset used for training (https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio)