Instructions to use JagjeevanAK/Speech-emotion-detection with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use JagjeevanAK/Speech-emotion-detection with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://JagjeevanAK/Speech-emotion-detection") - Notebooks
- Google Colab
- Kaggle
Speech Emotion Recognition Model
This model performs speech emotion recognition, classifying audio into 8 different emotional states.
Model Description
This is a deep learning model trained to recognize emotions from speech audio. The model can classify audio into the following emotions:
- π Neutral
- π Calm
- π Happy
- π’ Sad
- π Angry
- π¨ Fearful
- π€’ Disgust
- π² Surprised
Model Architecture
The model uses audio features extraction including:
- MFCC (Mel-frequency cepstral coefficients)
- Chroma features
- Mel-spectrogram features
Usage
import librosa
import numpy as np
from tensorflow.keras.models import load_model
# Load the model
model = load_model('trained_model.h5')
# Load and preprocess audio
def extract_feature(data, sr, mfcc=True, chroma=True, mel=True):
result = np.array([])
if mfcc:
mfccs = np.mean(librosa.feature.mfcc(y=data, sr=sr, n_mfcc=40).T, axis=0)
result = np.hstack((result, mfccs))
if chroma:
stft = np.abs(librosa.stft(data))
chroma_feat = np.mean(librosa.feature.chroma_stft(S=stft, sr=sr).T, axis=0)
result = np.hstack((result, chroma_feat))
if mel:
mel_feat = np.mean(librosa.feature.melspectrogram(y=data, sr=sr).T, axis=0)
result = np.hstack((result, mel_feat))
return result
# Load audio file
audio_path = "your_audio_file.wav"
data, sr = librosa.load(audio_path, sr=22050)
# Extract features
feature = extract_feature(data, sr, mfcc=True, chroma=True, mel=True)
feature = np.expand_dims(feature, axis=0)
feature = np.expand_dims(feature, axis=2)
# Make prediction
prediction = model.predict(feature)
predicted_class = np.argmax(prediction, axis=1)
# Map to emotion labels
emotions = {
0: 'Neutral',
1: 'Calm',
2: 'Happy',
3: 'Sad',
4: 'Angry',
5: 'Fearful',
6: 'Disgust',
7: 'Surprised'
}
predicted_emotion = emotions[predicted_class[0]]
print(f"Predicted emotion: {predicted_emotion}")
Requirements
librosa
tensorflow
numpy
scikit-learn
Training Data
The model was trained on the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset, which contains speech emotion recordings with the following emotion categories:
- Neutral
- Calm
- Happy
- Sad
- Angry
- Fearful
- Disgust
- Surprised
The dataset provides high-quality audio recordings from multiple speakers, allowing the model to learn robust emotion recognition patterns across different voices and speaking styles.
Model Performance
The model has been trained and evaluated with the following performance metrics:
Training Progress
The training curves show the model's learning progress over epochs, demonstrating convergence and good generalization.
Confusion Matrix
The confusion matrix shows the model's performance on the RAVDESS dataset, demonstrating how well the model distinguishes between different emotional states.
License
[Specify your license here]
Citation
If you use this model, please cite:
@misc{speech-emotion-recognition,
author = {JagjeevanAK},
title = {Speech Emotion Recognition Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/JagjeevanAK/Speech-emotion-detection}
}
Evaluation results
- Accuracy on RAVDESSself-reportedSee confusion matrix

