license: other
license_name: nvidia-audio2emotion-license
license_link: https://huggingface.co/nvidia/Audio2Emotion-v2.2/blob/main/LICENSE
extra_gated_prompt: >-
Use of this model is permitted solely under the terms of the Audio2Emotion
License. In particular, it may only be used within the NVIDIA Audio2Face
project and is strictly prohibited for any other purpose.
Audio2Emotion
Description
Audio2Emotion leverages deep learning to extract continuous emotional cues from speech audio, which are then used to drive Audio2Face-3D for more natural and expressive facial animations. By detecting emotions such as joy, sadness, anger, fear, disgust, and neutrality in real time, the system automatically conditions facial expressions without manual intervention. This seamless integration enables digital avatars to convey subtle emotional dynamics during speech, making interactions in gaming, virtual assistants, and immersive environments more lifelike and engaging.
Model Developer: NVIDIA
Model Versions
The Audio2Emotion release includes
- Audio2Emotion-v2.2 — stable version
- Audio2Emotion-v3.0 — research preview version (uses a double sliding window and provides better probability calibration)
The interface of both networks is identical, so no code changes are required.
Correspondence
Ilya Fedorov (ilyaf@nvidia.com), Dmitry Korobchenko (dkorobchenko@nvidia.com)
License
Your use of this model is governed by the NVIDIA Audio2Emotion License.
Citation
@article{chung2025audio2face,
title={Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars},
author={Chung, Chaeyeon and Fedorov, Ilya and Huang, Michael and Karmanov, Aleksey and Korobchenko, Dmitry and Ribera, Roger and Seol, Yeongho},
journal={arXiv preprint arXiv:00000000},
year={2025}
}