| | --- |
| | language: en |
| | license: apache-2.0 |
| | tags: |
| | - audio-classification |
| | - emotion-recognition |
| | - hubert |
| | - speech |
| | library_name: transformers |
| | pipeline_tag: audio-classification |
| | --- |
| | |
| | # HuBERT Emotion Recognition Model |
| |
|
| | Fine-tuned HuBERT model for emotion recognition in speech audio. |
| |
|
| | ## Model Description |
| |
|
| | This model classifies speech audio into 5 emotion categories: |
| |
|
| | 1. **Angry/Fearful** - Expressions of anger or fear |
| | 2. **Happy/Laugh** - Joyful or laughing expressions |
| | 3. **Neutral/Calm** - Neutral or calm speech |
| | 4. **Sad/Cry** - Expressions of sadness or crying |
| | 5. **Surprised/Amazed** - Surprised or amazed reactions |
| |
|
| | ## Quick Start |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | # Load the model |
| | classifier = pipeline("audio-classification", model="YOUR_USERNAME/hubert-emotion-recognition") |
| | |
| | # Predict emotion |
| | result = classifier("audio.wav") |
| | print(result) |
| | ``` |
| |
|
| | ## Detailed Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForAudioClassification, Wav2Vec2FeatureExtractor |
| | import torch |
| | import librosa |
| | |
| | # Load model and processor |
| | model = AutoModelForAudioClassification.from_pretrained("YOUR_USERNAME/hubert-emotion-recognition") |
| | processor = Wav2Vec2FeatureExtractor.from_pretrained("YOUR_USERNAME/hubert-emotion-recognition") |
| | |
| | # Load audio (16kHz) |
| | audio, sr = librosa.load("audio.wav", sr=16000) |
| | |
| | # Prepare inputs |
| | inputs = processor(audio, sampling_rate=16000, return_tensors="pt") |
| | |
| | # Get predictions |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | probs = torch.softmax(outputs.logits, dim=1)[0] |
| | pred_id = torch.argmax(probs).item() |
| | |
| | # Show results |
| | emotions = ["Angry/Fearful", "Happy/Laugh", "Neutral/Calm", "Sad/Cry", "Surprised/Amazed"] |
| | print(f"Emotion: {emotions[pred_id]}") |
| | print(f"Confidence: {probs[pred_id]:.3f}") |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | - **Base Model**: HuBERT |
| | - **Task**: Audio Classification |
| | - **Sample Rate**: 16kHz |
| | - **Max Duration**: 3 seconds |
| | - **Framework**: PyTorch + Transformers |
| |
|
| | ## Training Data |
| |
|
| | [Describe your training dataset here - name, size, speakers, etc.] |
| |
|
| | ## Performance |
| |
|
| | [Add your evaluation metrics here] |
| |
|
| | Example: |
| | - Accuracy: 87.3% |
| | - F1 Score: 85.1% |
| |
|
| | ## Limitations |
| |
|
| | - Optimized for English speech |
| | - Works best with clear audio (3 seconds) |
| | - Performance may vary with background noise |
| | - Emotion expression varies across cultures |
| |
|
| | ## Intended Uses |
| |
|
| | β
Call center analytics |
| | β
Mental health monitoring |
| | β
Voice assistants |
| | β
Media analysis |
| | β
Research in affective computing |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{hubert_emotion_2024, |
| | author = {YOUR_NAME}, |
| | title = {HuBERT Emotion Recognition}, |
| | year = {2024}, |
| | publisher = {Hugging Face}, |
| | url = {https://huggingface.co/YOUR_USERNAME/hubert-emotion-recognition} |
| | } |
| | ``` |