|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- stapesai/ssi-speech-emotion-recognition |
|
|
metrics: |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
pipeline_tag: audio-classification |
|
|
base_model: |
|
|
- facebook/wav2vec2-base |
|
|
--- |
|
|
|
|
|
# Multimodal Emotion Speech Recognition |
|
|
|
|
|
## Model Description |
|
|
This model performs emotion recognition from speech using a multimodal approach, utilizing: |
|
|
- **Audio Model**: Wav2Vec2 Base |
|
|
## Dataset |
|
|
- **Dataset Name**: [stapesai/ssi-speech-emotion-recognition](https://huggingface.co/datasets/stapesai/ssi-speech-emotion-recognition) |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
### Classification Report |
|
|
``` |
|
|
precision recall f1-score support |
|
|
|
|
|
ANG 0.97 0.93 0.95 30 |
|
|
CAL 0.00 0.00 0.00 0 |
|
|
DIS 0.95 0.90 0.92 20 |
|
|
FEA 0.76 0.70 0.73 27 |
|
|
HAP 0.87 0.82 0.84 33 |
|
|
NEU 0.96 0.96 0.96 25 |
|
|
SAD 0.73 1.00 0.84 19 |
|
|
SUR 0.88 0.78 0.82 9 |
|
|
|
|
|
accuracy 0.87 163 |
|
|
macro avg 0.76 0.76 0.76 163 |
|
|
weighted avg 0.88 0.87 0.87 163 |
|
|
``` |
|
|
|
|
|
**Overall Accuracy**: 87% |