emotion-detector / README.md
abedir's picture
Create README.md
2159178 verified
---
language: en
license: apache-2.0
tags:
- audio-classification
- emotion-recognition
- hubert
- speech
library_name: transformers
pipeline_tag: audio-classification
---
# HuBERT Emotion Recognition Model
Fine-tuned HuBERT model for emotion recognition in speech audio.
## Model Description
This model classifies speech audio into 5 emotion categories:
1. **Angry/Fearful** - Expressions of anger or fear
2. **Happy/Laugh** - Joyful or laughing expressions
3. **Neutral/Calm** - Neutral or calm speech
4. **Sad/Cry** - Expressions of sadness or crying
5. **Surprised/Amazed** - Surprised or amazed reactions
## Quick Start
```python
from transformers import pipeline
# Load the model
classifier = pipeline("audio-classification", model="YOUR_USERNAME/hubert-emotion-recognition")
# Predict emotion
result = classifier("audio.wav")
print(result)
```
## Detailed Usage
```python
from transformers import AutoModelForAudioClassification, Wav2Vec2FeatureExtractor
import torch
import librosa
# Load model and processor
model = AutoModelForAudioClassification.from_pretrained("YOUR_USERNAME/hubert-emotion-recognition")
processor = Wav2Vec2FeatureExtractor.from_pretrained("YOUR_USERNAME/hubert-emotion-recognition")
# Load audio (16kHz)
audio, sr = librosa.load("audio.wav", sr=16000)
# Prepare inputs
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)[0]
pred_id = torch.argmax(probs).item()
# Show results
emotions = ["Angry/Fearful", "Happy/Laugh", "Neutral/Calm", "Sad/Cry", "Surprised/Amazed"]
print(f"Emotion: {emotions[pred_id]}")
print(f"Confidence: {probs[pred_id]:.3f}")
```
## Model Details
- **Base Model**: HuBERT
- **Task**: Audio Classification
- **Sample Rate**: 16kHz
- **Max Duration**: 3 seconds
- **Framework**: PyTorch + Transformers
## Training Data
[Describe your training dataset here - name, size, speakers, etc.]
## Performance
[Add your evaluation metrics here]
Example:
- Accuracy: 87.3%
- F1 Score: 85.1%
## Limitations
- Optimized for English speech
- Works best with clear audio (3 seconds)
- Performance may vary with background noise
- Emotion expression varies across cultures
## Intended Uses
βœ… Call center analytics
βœ… Mental health monitoring
βœ… Voice assistants
βœ… Media analysis
βœ… Research in affective computing
## License
Apache 2.0
## Citation
```bibtex
@misc{hubert_emotion_2024,
author = {YOUR_NAME},
title = {HuBERT Emotion Recognition},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/YOUR_USERNAME/hubert-emotion-recognition}
}
```