--- license: apache-2.0 datasets: - stapesai/ssi-speech-emotion-recognition metrics: - accuracy - precision - recall - f1 pipeline_tag: audio-classification base_model: - facebook/wav2vec2-base --- # Multimodal Emotion Speech Recognition ## Model Description This model performs emotion recognition from speech using a multimodal approach, utilizing: - **Audio Model**: Wav2Vec2 Base ## Dataset - **Dataset Name**: [stapesai/ssi-speech-emotion-recognition](https://huggingface.co/datasets/stapesai/ssi-speech-emotion-recognition) ## Evaluation Results ### Classification Report ``` precision recall f1-score support ANG 0.97 0.93 0.95 30 CAL 0.00 0.00 0.00 0 DIS 0.95 0.90 0.92 20 FEA 0.76 0.70 0.73 27 HAP 0.87 0.82 0.84 33 NEU 0.96 0.96 0.96 25 SAD 0.73 1.00 0.84 19 SUR 0.88 0.78 0.82 9 accuracy 0.87 163 macro avg 0.76 0.76 0.76 163 weighted avg 0.88 0.87 0.87 163 ``` **Overall Accuracy**: 87%