laion
/

Empathic-Insight-Voice-Small

Model card Files Files and versions

ChristophSchuhmann commited on May 21, 2025

Commit

1446270

·

verified ·

1 Parent(s): d3949db

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -5,6 +5,12 @@ license: cc-by-4.0
 # Empathic-Insight-Voice-Small
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
 ## Example Video Analyses (Top 3 Emotions)
 <!-- This section will be populated by the HTML from Cell 0 -->
 <div style='display: flex; flex-wrap: wrap; justify-content: flex-start; gap: 15px;'>
@@ -34,12 +40,6 @@ license: cc-by-4.0
             </div>
             </div>
-**Empathic-Insight-Voice-Small** is a suite of 40+ emotion and attribute regression models trained on the EMONET-VOICE benchmark dataset, which is derived from the large-scale, multilingual synthetic voice-acting dataset LAION'S GOT TALENT. Each model is designed to predict the intensity of a specific fine-grained emotion or attribute from speech audio. These models leverage embeddings from a fine-tuned Whisper model (mkrausio/EmoWhisper-AnS-Small-v0.1) followed by dedicated MLP regression heads for each dimension.
-This work is based on the research paper:
-**"EMONET-VOICE: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection"**
 ## Model Description
 The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.

 # Empathic-Insight-Voice-Small
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
+**Empathic-Insight-Voice-Small** is a suite of 40+ emotion and attribute regression models trained on the EMONET-VOICE benchmark dataset, which is derived from the large-scale, multilingual synthetic voice-acting dataset LAION'S GOT TALENT. Each model is designed to predict the intensity of a specific fine-grained emotion or attribute from speech audio. These models leverage embeddings from a fine-tuned Whisper model (mkrausio/EmoWhisper-AnS-Small-v0.1) followed by dedicated MLP regression heads for each dimension.
+This work is based on the research paper:
+**"EMONET-VOICE: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection"**
 ## Example Video Analyses (Top 3 Emotions)
 <!-- This section will be populated by the HTML from Cell 0 -->
 <div style='display: flex; flex-wrap: wrap; justify-content: flex-start; gap: 15px;'>
             </div>
             </div>
 ## Model Description
 The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.