YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Wav2Vec2 Emotion Speech Recognition
This model is a fine-tuned version of facebook/wav2vec2-base for emotion recognition from speech.
Model Description
- Base Model: facebook/wav2vec2-base
- Task: Emotion Classification
- Fine-tuning: Custom classification head added to Wav2Vec2.
How to use
To use this model, you need the custom Wav2Vec2Emotion class definition.
import torch
import torch.nn as nn
from transformers import Wav2Vec2Model, Wav2Vec2Processor
from transformers.modeling_outputs import SequenceClassifierOutput
class Wav2Vec2Emotion(nn.Module):
def __init__(self, num_labels: int, pretrained: str = "facebook/wav2vec2-base"):
super().__init__()
self.wav2vec = Wav2Vec2Model.from_pretrained(pretrained)
self.classifier = nn.Sequential(
nn.Linear(self.wav2vec.config.hidden_size, 256),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(256, num_labels),
)
def forward(self, input_values, attention_mask=None, labels=None):
outputs = self.wav2vec(input_values, attention_mask=attention_mask)
hidden = outputs.last_hidden_state
pooled = torch.mean(hidden, dim=1) # Simple average pooling
logits = self.classifier(pooled)
loss = None
if labels is not None:
loss = nn.CrossEntropyLoss()(logits, labels)
return SequenceClassifierOutput(loss=loss, logits=logits)
# Load model and processor
# model = Wav2Vec2Emotion(num_labels=...)
# model.load_state_dict(torch.load("model.pt"))
# processor = Wav2Vec2Processor.from_pretrained("...")
Labels
The labels used in this model are:
angry(0)disgust(1)fear(2)happy(3)neutral(4)ps(pleasant surprised) (5)sad(6)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support