Fine-tuned Wav2Vec2 for Speech Emotion Recognition

Base model: facebook/wav2vec2-base

Finetuned from a Kaggle dataset: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess

Model Usage

I used different versions of transformers, tokenizers and hugging face_hub libraries. Which also meant having to alter the version of other dependencies

!pip uninstall -y peft gradio
!pip install -q --no-deps transformers==4.44.2 tokenizers==0.19.1 huggingface_hub==0.24.6
!pip install -q peft==0.7.1

Then:

import torch
import librosa
from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2Processor

id2label = {0: "fear", 1: "angry", 2: "disgust", 3: "neutral", 4: "sad", 5: "ps", 6: "happy"} 

model = Wav2Vec2ForSequenceClassification.from_pretrained("pollitoconpapass/wav2vec2-finetuned-emotions")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base")

audio_path = "/kaggle/input/toronto-emotional-speech-set-tess/TESS Toronto emotional speech set data/YAF_sad/YAF_bath_sad.wav"
speech, sr = librosa.load(audio_path, sr=16000)

inputs = processor(
    speech, 
    sampling_rate=16000, 
    return_tensors="pt", 
    padding=True
)

with torch.no_grad():
    logits = model(**inputs).logits
predicted_class_id = torch.argmax(logits, dim=-1).item()

print(id2label[predicted_class_id])

Fine-Tuning reference extracted from: https://www.youtube.com/watch?v=s5-8yeYJV7Y&t=64s

Downloads last month
8
Safetensors
Model size
94.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pollitoconpapass/wav2vec2-finetuned-emotions

Finetuned
(902)
this model