Transformer CNN Emotion Recognition (code, models)

076305e verified 3 months ago

954 Bytes

	# transformer-cnn-emotion-recognition

	## Input

	audio file

	```
	03-01-01-01-01-01-01.wav
	RAVDESS Dataset
	https://smartlaboratory.org/ravdess/
	```

	## Output

	emotion label

	```
	Emotion: neutral
	Confidence: 0.99993193
	```

	## Labels

	```
	"surprised", "neutral", "calm", "happy",
	"sad", "angry", "fearful", "disgust"
	```

	## Requirements
	This model requires additional module.

	```
	pip3 install librosa
	```

	## Usage

	```bash
	$ python3 transformer-cnn-emotion-recognition.py -i input.wav
	```

	## Reference

	[Combining Spatial and Temporal Feature Representions of Speech Emotion by Parallelizing CNNs and Transformer-Encoders](https://github.com/IliaZenkov/transformer-cnn-emotion-recognition)

	## Framework

	PyTorch 1.6.0

	## Model Format

	ONNX opset = 11

	## Netron

	[parallel_is_all_you_want_ep428.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/parallel_is_all_you_want/parallel_is_all_you_want_ep428.onnx.prototxt)