Update README.md
Browse files
README.md
CHANGED
|
@@ -5,6 +5,12 @@ license: cc-by-4.0
|
|
| 5 |
# Empathic-Insight-Voice-Small
|
| 6 |
[](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
## Example Video Analyses (Top 3 Emotions)
|
| 9 |
<!-- This section will be populated by the HTML from Cell 0 -->
|
| 10 |
<div style='display: flex; flex-wrap: wrap; justify-content: flex-start; gap: 15px;'>
|
|
@@ -34,12 +40,6 @@ license: cc-by-4.0
|
|
| 34 |
</div>
|
| 35 |
</div>
|
| 36 |
|
| 37 |
-
**Empathic-Insight-Voice-Small** is a suite of 40+ emotion and attribute regression models trained on the EMONET-VOICE benchmark dataset, which is derived from the large-scale, multilingual synthetic voice-acting dataset LAION'S GOT TALENT. Each model is designed to predict the intensity of a specific fine-grained emotion or attribute from speech audio. These models leverage embeddings from a fine-tuned Whisper model (mkrausio/EmoWhisper-AnS-Small-v0.1) followed by dedicated MLP regression heads for each dimension.
|
| 38 |
-
|
| 39 |
-
This work is based on the research paper:
|
| 40 |
-
**"EMONET-VOICE: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection"**
|
| 41 |
-
|
| 42 |
-
|
| 43 |
## Model Description
|
| 44 |
|
| 45 |
The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.
|
|
|
|
| 5 |
# Empathic-Insight-Voice-Small
|
| 6 |
[](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
|
| 7 |
|
| 8 |
+
**Empathic-Insight-Voice-Small** is a suite of 40+ emotion and attribute regression models trained on the EMONET-VOICE benchmark dataset, which is derived from the large-scale, multilingual synthetic voice-acting dataset LAION'S GOT TALENT. Each model is designed to predict the intensity of a specific fine-grained emotion or attribute from speech audio. These models leverage embeddings from a fine-tuned Whisper model (mkrausio/EmoWhisper-AnS-Small-v0.1) followed by dedicated MLP regression heads for each dimension.
|
| 9 |
+
|
| 10 |
+
This work is based on the research paper:
|
| 11 |
+
**"EMONET-VOICE: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection"**
|
| 12 |
+
|
| 13 |
+
|
| 14 |
## Example Video Analyses (Top 3 Emotions)
|
| 15 |
<!-- This section will be populated by the HTML from Cell 0 -->
|
| 16 |
<div style='display: flex; flex-wrap: wrap; justify-content: flex-start; gap: 15px;'>
|
|
|
|
| 40 |
</div>
|
| 41 |
</div>
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
## Model Description
|
| 44 |
|
| 45 |
The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.
|