SER Wav2Vec2 Finetuned on GEMEP (French)

This model is a fine-tuned version of the audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim for Speech Emotion Recognition (SER) in French.

It specifically targets the prediction of emotional dimensions: Valence, Arousal, and Dominance (VAD).

Model Description

The model was fine-tuned using the GEMEP corpus (Geneva Multimodal Expression Portrayal), which contains pseudo-sentences uttered by professional actors. For the purpose of the associated research paper, we collected new annotations for Valence, Arousal, and Dominance through a dedicated user study. The target scores are the mean values of these human annotations.

Base Model: Wav2Vec2-Large-Robust-12
Pre-trained weights by: audeering
Language: French (Pseudo-speech)
Task: Dimensional Emotion Regression (VAD)

Usage

To use this model, you need the transformers library and the specific processing logic used by the Audeering architecture.

from transformers import AutoModel, AutoFeatureExtractor
import torch
import torch.nn as nn

# Load model and processor
model_name = "rosalied/ser-w2v2-finetuned"
processor = AutoFeatureExtractor.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Example: Inference on a 16kHz audio array
# input_values = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_values
# with torch.no_grad():
#    outputs = model(input_values)
#    # The outputs represent [Arousal, Dominance, Valence]

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for rosalied/ser-w2v2-finetuned

Base model

audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim

Finetuned

(4)

this model