| # SentimentSound | |
| ## Overview | |
| This is a deep learning model for Speech Emotion Recognition that can classify audio clips into different emotional states. The model is trained on a dataset of speech samples and can identify emotions such as neutral, calm, happy, sad, angry, fearful, disgust, and surprised. | |
| ## Model Details | |
| - **Model Type:** Hybrid Neural Network (CNN + LSTM) | |
| - **Input:** Audio features extracted from 3-second wav files | |
| - **Output:** Emotion classification | |
| ### Supported Emotions | |
| - Neutral | |
| - Calm | |
| - Happy | |
| - Sad | |
| - Angry | |
| - Fearful | |
| - Disgust | |
| - Surprised | |
| ## Installation | |
| ### Clone the Repository | |
| ```bash | |
| git clone https://github.com/Vishal-Padia/SentimentSound.git | |
| ``` | |
| ### Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Usage Example | |
| ```bash | |
| python emotion_predictor.py | |
| ``` | |
| ## Model Performance | |
| - **Accuracy:** 85% | |
| - **Evaluation Metrics:** Confusion matrix below | |
|  | |
| ## Training Details | |
| - **Feature Extraction:** | |
| - MFCC | |
| - Spectral Centroid | |
| - Chroma Features | |
| - Spectral Contrast | |
| - Zero Crossing Rate | |
| - Spectral Rolloff | |
| - **Augmentation:** Random noise and scaling applied | |
| - **Training Techniques:** | |
| - Class weighted loss | |
| - AdamW optimizer | |
| - Learning rate scheduling | |
| - Gradient clipping | |
| ## Limitations | |
| - Works best with clear speech recordings | |
| - Optimized for 3-second audio clips | |
| - Performance may vary with different audio sources | |
| ## Acknowledgments | |
| - Dataset used for training (https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio) |