Audio Classification
Transformers
Safetensors
Russian
whisper
SER
speech
emotion
Eval Results (legacy)
Instructions to use waveletdeboshir/whisper-base-ser-dusha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use waveletdeboshir/whisper-base-ser-dusha with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("audio-classification", model="waveletdeboshir/whisper-base-ser-dusha")# Load model directly from transformers import AutoProcessor, AutoModelForAudioClassification processor = AutoProcessor.from_pretrained("waveletdeboshir/whisper-base-ser-dusha") model = AutoModelForAudioClassification.from_pretrained("waveletdeboshir/whisper-base-ser-dusha") - Notebooks
- Google Colab
- Kaggle
Whisper-base for Speech Emotion Recognition in Russian on Dusha dataset
Whisper-base encoder with classification head for speech emotion recognition.
Dusha dataset: https://github.com/salute-developers/golos/tree/master/dusha
Multiclass classification into 5 classes:
- angry 0
- sad 1
- neutral 2
- positive 3
- other 4
Model was fine-tuned on full Dusha-crowd with
- augmentations Time Shift, Time Masking and Colored Noise;
- WeightedRandomSampler.
Usage
import torch
import torchaudio
from transformers import WhisperForAudioClassification, WhisperFeatureExtractor
# load model and feature extractor
model = WhisperForAudioClassification.from_pretrained("waveletdeboshir/whisper-base-ser-dusha")
model.eval()
feature_extractor = WhisperFeatureExtractor.from_pretrained("waveletdeboshir/whisper-base-ser-dusha")
# load audio and resample if necessary
wav, sr = torchaudio.load("audio.wav")
if sr != 16000:
wav = torchaudio.functional.resample(wav, sr, 16000)
# compute predictions
features = feature_extractor(wav[0], sampling_rate=16000, return_tensors="pt").input_features
with torch.no_grad():
preds = model(features)
# get emotion and its probability
probs = torch.nn.functional.softmax(preds.logits, dim=-1)
print(f"Predicted emotion: {model.config.id2label[probs.argmax().item()]} with probability {probs.max().item():.4f}")
- Downloads last month
- 117
Model tree for waveletdeboshir/whisper-base-ser-dusha
Base model
openai/whisper-baseEvaluation results
- Test Weighted Accuracy on Sberdevices Dusha (crowd)self-reported0.836
- Test F1 macro on Sberdevices Dusha (crowd)self-reported0.843
- Test Recall macro on Sberdevices Dusha (crowd)self-reported0.830
- Test Precision macro on Sberdevices Dusha (crowd)self-reported0.850