Instructions to use SamOp224/speech-emotion-recognition with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use SamOp224/speech-emotion-recognition with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://SamOp224/speech-emotion-recognition") - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - audio-classification | |
| - speech-emotion-recognition | |
| - tensorflow | |
| - keras | |
| - emotion2vec | |
| language: | |
| - en | |
| license: apache-2.0 | |
| metrics: | |
| - accuracy | |
| # Speech Emotion Recognition (SER) System | |
| ## Overview | |
| Production-quality Speech Emotion Recognition detecting **6 core emotions** from voice/audio: | |
| - **Angry** | **Disgust** | **Fear** | **Happy** | **Neutral** | **Sad** | |
| ## Architecture | |
| **Fusion Model**: CNN + BiLSTM + Multi-Head Self-Attention (spectrogram features) + emotion2vec embeddings | |
| ### Feature Pipeline | |
| | Feature | Dimensions | | |
| |---------|-----------| | |
| | Mel Spectrogram | 128 bands | | |
| | MFCC | 40 coefficients | | |
| | Zero Crossing Rate | 1 | | |
| | RMS Energy | 1 | | |
| | **Total** | **170 Γ 200 β (170, 200, 1)** | | |
| | emotion2vec embedding | 768-dim | | |
| ### Training Data | |
| - **CREMA-D**: 7,442 clips, 91 actors (train/val/test split provided) | |
| - **RAVDESS**: 1,056 speech clips, 24 actors (70/15/15 split) | |
| - **Augmentation**: pitch shift, time stretch, Gaussian noise, SpecAugment | |
| ## Results | |
| | Model | Val Accuracy | Test Accuracy | | |
| |-------|-------------|---------------| | |
| | CNN+BiLSTM+Attention | 56.0% | 59.2% | | |
| | **Fusion (CNN + emotion2vec)** | **53.2%** | **54.9%** | | |
| | Human baseline (audio-only) | - | 40.9% | | |
| **Best: Model 1 β 59.2% test accuracy (+18.3pp over human baseline)** | |
| ## Quick Start | |
| ```bash | |
| pip install tensorflow librosa numpy funasr modelscope | |
| ``` | |
| ```python | |
| from predict import predict_emotion | |
| label, confidence, probs = predict_emotion("audio.wav", model_dir="./outputs") | |
| # Prints: Predicted Emotion: HAPPY, Confidence: 87.3% | |
| ``` | |
| ## Download & Use Locally | |
| ```bash | |
| # Clone the repo | |
| git lfs install | |
| git clone https://huggingface.co/SamOp224/speech-emotion-recognition | |
| cd speech-emotion-recognition | |
| # Run prediction | |
| python outputs/predict.py your_audio.wav outputs | |
| ``` | |
| ## Files | |
| - `outputs/fusion_model.keras` β Fusion model (best) | |
| - `outputs/model1_cnn_bilstm_attn.keras` β CNN+BiLSTM+Attention standalone | |
| - `outputs/predict.py` β Prediction script with visualization | |
| - `outputs/config.json` β Configuration and results | |