# transformer-cnn-emotion-recognition ## Input audio file ``` 03-01-01-01-01-01-01.wav RAVDESS Dataset https://smartlaboratory.org/ravdess/ ``` ## Output emotion label ``` Emotion: neutral Confidence: 0.99993193 ``` ## Labels ``` "surprised", "neutral", "calm", "happy", "sad", "angry", "fearful", "disgust" ``` ## Requirements This model requires additional module. ``` pip3 install librosa ``` ## Usage ```bash $ python3 transformer-cnn-emotion-recognition.py -i input.wav ``` ## Reference [Combining Spatial and Temporal Feature Representions of Speech Emotion by Parallelizing CNNs and Transformer-Encoders](https://github.com/IliaZenkov/transformer-cnn-emotion-recognition) ## Framework PyTorch 1.6.0 ## Model Format ONNX opset = 11 ## Netron [parallel_is_all_you_want_ep428.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/parallel_is_all_you_want/parallel_is_all_you_want_ep428.onnx.prototxt)