Fine-tuned Wav2Vec2 for Speech Emotion Recognition
Base model: facebook/wav2vec2-base
Finetuned from a Kaggle dataset: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess
Model Usage
I used different versions of transformers, tokenizers and hugging face_hub libraries. Which also meant having to alter the version of other dependencies
!pip uninstall -y peft gradio
!pip install -q --no-deps transformers==4.44.2 tokenizers==0.19.1 huggingface_hub==0.24.6
!pip install -q peft==0.7.1
Then:
import torch
import librosa
from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2Processor
id2label = {0: "fear", 1: "angry", 2: "disgust", 3: "neutral", 4: "sad", 5: "ps", 6: "happy"}
model = Wav2Vec2ForSequenceClassification.from_pretrained("pollitoconpapass/wav2vec2-finetuned-emotions")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base")
audio_path = "/kaggle/input/toronto-emotional-speech-set-tess/TESS Toronto emotional speech set data/YAF_sad/YAF_bath_sad.wav"
speech, sr = librosa.load(audio_path, sr=16000)
inputs = processor(
speech,
sampling_rate=16000,
return_tensors="pt",
padding=True
)
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = torch.argmax(logits, dim=-1).item()
print(id2label[predicted_class_id])
Fine-Tuning reference extracted from: https://www.youtube.com/watch?v=s5-8yeYJV7Y&t=64s
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for pollitoconpapass/wav2vec2-finetuned-emotions
Base model
facebook/wav2vec2-base