abedir
/

emotion-detector

Audio Classification

emotion-recognition

Model card Files Files and versions

emotion-detector / README.md

abedir's picture

Create README.md

2159178 verified 18 days ago

|

history blame contribute delete

2.7 kB

	---
	language: en
	license: apache-2.0
	tags:
	- audio-classification
	- emotion-recognition
	- hubert
	- speech
	library_name: transformers
	pipeline_tag: audio-classification
	---

	# HuBERT Emotion Recognition Model

	Fine-tuned HuBERT model for emotion recognition in speech audio.

	## Model Description

	This model classifies speech audio into 5 emotion categories:

	1. Angry/Fearful - Expressions of anger or fear
	2. Happy/Laugh - Joyful or laughing expressions
	3. Neutral/Calm - Neutral or calm speech
	4. Sad/Cry - Expressions of sadness or crying
	5. Surprised/Amazed - Surprised or amazed reactions

	## Quick Start

	```python
	from transformers import pipeline

	# Load the model
	classifier = pipeline("audio-classification", model="YOUR_USERNAME/hubert-emotion-recognition")

	# Predict emotion
	result = classifier("audio.wav")
	print(result)
	```

	## Detailed Usage

	```python
	from transformers import AutoModelForAudioClassification, Wav2Vec2FeatureExtractor
	import torch
	import librosa

	# Load model and processor
	model = AutoModelForAudioClassification.from_pretrained("YOUR_USERNAME/hubert-emotion-recognition")
	processor = Wav2Vec2FeatureExtractor.from_pretrained("YOUR_USERNAME/hubert-emotion-recognition")

	# Load audio (16kHz)
	audio, sr = librosa.load("audio.wav", sr=16000)

	# Prepare inputs
	inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

	# Get predictions
	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=1)[0]
	pred_id = torch.argmax(probs).item()

	# Show results
	emotions = ["Angry/Fearful", "Happy/Laugh", "Neutral/Calm", "Sad/Cry", "Surprised/Amazed"]
	print(f"Emotion: {emotions[pred_id]}")
	print(f"Confidence: {probs[pred_id]:.3f}")
	```

	## Model Details

	- Base Model: HuBERT
	- Task: Audio Classification
	- Sample Rate: 16kHz
	- Max Duration: 3 seconds
	- Framework: PyTorch + Transformers

	## Training Data

	[Describe your training dataset here - name, size, speakers, etc.]

	## Performance

	[Add your evaluation metrics here]

	Example:
	- Accuracy: 87.3%
	- F1 Score: 85.1%

	## Limitations

	- Optimized for English speech
	- Works best with clear audio (3 seconds)
	- Performance may vary with background noise
	- Emotion expression varies across cultures

	## Intended Uses

	✅ Call center analytics
	✅ Mental health monitoring
	✅ Voice assistants
	✅ Media analysis
	✅ Research in affective computing

	## License

	Apache 2.0

	## Citation

	```bibtex
	@misc{hubert_emotion_2024,
	author = {YOUR_NAME},
	title = {HuBERT Emotion Recognition},
	year = {2024},
	publisher = {Hugging Face},
	url = {https://huggingface.co/YOUR_USERNAME/hubert-emotion-recognition}
	}
	```