File size: 2,701 Bytes
2159178 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | ---
language: en
license: apache-2.0
tags:
- audio-classification
- emotion-recognition
- hubert
- speech
library_name: transformers
pipeline_tag: audio-classification
---
# HuBERT Emotion Recognition Model
Fine-tuned HuBERT model for emotion recognition in speech audio.
## Model Description
This model classifies speech audio into 5 emotion categories:
1. **Angry/Fearful** - Expressions of anger or fear
2. **Happy/Laugh** - Joyful or laughing expressions
3. **Neutral/Calm** - Neutral or calm speech
4. **Sad/Cry** - Expressions of sadness or crying
5. **Surprised/Amazed** - Surprised or amazed reactions
## Quick Start
```python
from transformers import pipeline
# Load the model
classifier = pipeline("audio-classification", model="YOUR_USERNAME/hubert-emotion-recognition")
# Predict emotion
result = classifier("audio.wav")
print(result)
```
## Detailed Usage
```python
from transformers import AutoModelForAudioClassification, Wav2Vec2FeatureExtractor
import torch
import librosa
# Load model and processor
model = AutoModelForAudioClassification.from_pretrained("YOUR_USERNAME/hubert-emotion-recognition")
processor = Wav2Vec2FeatureExtractor.from_pretrained("YOUR_USERNAME/hubert-emotion-recognition")
# Load audio (16kHz)
audio, sr = librosa.load("audio.wav", sr=16000)
# Prepare inputs
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)[0]
pred_id = torch.argmax(probs).item()
# Show results
emotions = ["Angry/Fearful", "Happy/Laugh", "Neutral/Calm", "Sad/Cry", "Surprised/Amazed"]
print(f"Emotion: {emotions[pred_id]}")
print(f"Confidence: {probs[pred_id]:.3f}")
```
## Model Details
- **Base Model**: HuBERT
- **Task**: Audio Classification
- **Sample Rate**: 16kHz
- **Max Duration**: 3 seconds
- **Framework**: PyTorch + Transformers
## Training Data
[Describe your training dataset here - name, size, speakers, etc.]
## Performance
[Add your evaluation metrics here]
Example:
- Accuracy: 87.3%
- F1 Score: 85.1%
## Limitations
- Optimized for English speech
- Works best with clear audio (3 seconds)
- Performance may vary with background noise
- Emotion expression varies across cultures
## Intended Uses
✅ Call center analytics
✅ Mental health monitoring
✅ Voice assistants
✅ Media analysis
✅ Research in affective computing
## License
Apache 2.0
## Citation
```bibtex
@misc{hubert_emotion_2024,
author = {YOUR_NAME},
title = {HuBERT Emotion Recognition},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/YOUR_USERNAME/hubert-emotion-recognition}
}
``` |