ChristophSchuhmann commited on
Commit
1446270
·
verified ·
1 Parent(s): d3949db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -5,6 +5,12 @@ license: cc-by-4.0
5
  # Empathic-Insight-Voice-Small
6
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
7
 
 
 
 
 
 
 
8
  ## Example Video Analyses (Top 3 Emotions)
9
  <!-- This section will be populated by the HTML from Cell 0 -->
10
  <div style='display: flex; flex-wrap: wrap; justify-content: flex-start; gap: 15px;'>
@@ -34,12 +40,6 @@ license: cc-by-4.0
34
  </div>
35
  </div>
36
 
37
- **Empathic-Insight-Voice-Small** is a suite of 40+ emotion and attribute regression models trained on the EMONET-VOICE benchmark dataset, which is derived from the large-scale, multilingual synthetic voice-acting dataset LAION'S GOT TALENT. Each model is designed to predict the intensity of a specific fine-grained emotion or attribute from speech audio. These models leverage embeddings from a fine-tuned Whisper model (mkrausio/EmoWhisper-AnS-Small-v0.1) followed by dedicated MLP regression heads for each dimension.
38
-
39
- This work is based on the research paper:
40
- **"EMONET-VOICE: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection"**
41
-
42
-
43
  ## Model Description
44
 
45
  The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.
 
5
  # Empathic-Insight-Voice-Small
6
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
7
 
8
+ **Empathic-Insight-Voice-Small** is a suite of 40+ emotion and attribute regression models trained on the EMONET-VOICE benchmark dataset, which is derived from the large-scale, multilingual synthetic voice-acting dataset LAION'S GOT TALENT. Each model is designed to predict the intensity of a specific fine-grained emotion or attribute from speech audio. These models leverage embeddings from a fine-tuned Whisper model (mkrausio/EmoWhisper-AnS-Small-v0.1) followed by dedicated MLP regression heads for each dimension.
9
+
10
+ This work is based on the research paper:
11
+ **"EMONET-VOICE: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection"**
12
+
13
+
14
  ## Example Video Analyses (Top 3 Emotions)
15
  <!-- This section will be populated by the HTML from Cell 0 -->
16
  <div style='display: flex; flex-wrap: wrap; justify-content: flex-start; gap: 15px;'>
 
40
  </div>
41
  </div>
42
 
 
 
 
 
 
 
43
  ## Model Description
44
 
45
  The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.