File size: 954 Bytes
076305e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# transformer-cnn-emotion-recognition
## Input
audio file
```
03-01-01-01-01-01-01.wav
RAVDESS Dataset
https://smartlaboratory.org/ravdess/
```
## Output
emotion label
```
Emotion: neutral
Confidence: 0.99993193
```
## Labels
```
"surprised", "neutral", "calm", "happy",
"sad", "angry", "fearful", "disgust"
```
## Requirements
This model requires additional module.
```
pip3 install librosa
```
## Usage
```bash
$ python3 transformer-cnn-emotion-recognition.py -i input.wav
```
## Reference
[Combining Spatial and Temporal Feature Representions of Speech Emotion by Parallelizing CNNs and Transformer-Encoders](https://github.com/IliaZenkov/transformer-cnn-emotion-recognition)
## Framework
PyTorch 1.6.0
## Model Format
ONNX opset = 11
## Netron
[parallel_is_all_you_want_ep428.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/parallel_is_all_you_want/parallel_is_all_you_want_ep428.onnx.prototxt)
|