Update README.md
#2
by
felfri
- opened
README.md
CHANGED
|
@@ -1,7 +1,13 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
|
|
|
| 5 |
# Empathic-Insight-Voice-Small
|
| 6 |
[](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
|
| 7 |
|
|
@@ -44,7 +50,7 @@ This work is based on the research paper:
|
|
| 44 |
|
| 45 |
The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.
|
| 46 |
|
| 47 |
-
The models were trained on a large dataset of synthetic & "in the wild" speech (
|
| 48 |
|
| 49 |
|
| 50 |
## Intended Use
|
|
@@ -56,11 +62,11 @@ These models are intended for research purposes in affective computing, speech e
|
|
| 56 |
* Explore multilingual and cross-cultural aspects of speech emotion (given the foundation dataset).
|
| 57 |
|
| 58 |
**Out-of-Scope Use:**
|
| 59 |
-
These models are trained on synthetic speech and their generalization to spontaneous real-world speech needs further evaluation. They should not be used
|
| 60 |
|
| 61 |
## How to Use
|
| 62 |
|
| 63 |
-
The primary way to use these models is through the provided [Google Colab Notebook](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2). The notebook handles dependencies, model loading, audio processing, and provides examples for:
|
| 64 |
* Batch processing a folder of audio files.
|
| 65 |
* Generating a comprehensive HTML report with per-file emotion scores, waveforms, and audio players.
|
| 66 |
* Generating individual JSON files with all predicted scores for each audio file.
|
|
@@ -343,4 +349,4 @@ The EMONET-VOICE suite was developed with ethical considerations as a priority:
|
|
| 343 |
|
| 344 |
Privacy Preservation: The use of synthetic voice generation fundamentally circumvents privacy concerns associated with collecting real human emotional expressions, especially for sensitive states.
|
| 345 |
|
| 346 |
-
Responsible Use: These models are
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-4.0
|
| 3 |
+
datasets:
|
| 4 |
+
- t1a5anu-anon/emonet-voice-foundation
|
| 5 |
+
base_model:
|
| 6 |
+
- mkrausio/EmoWhisper-AnS-Small-v0.1
|
| 7 |
+
pipeline_tag: audio-classification
|
| 8 |
---
|
| 9 |
|
| 10 |
+
|
| 11 |
# Empathic-Insight-Voice-Small
|
| 12 |
[](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
|
| 13 |
|
|
|
|
| 50 |
|
| 51 |
The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.
|
| 52 |
|
| 53 |
+
The models were trained on a large dataset of synthetic & "in the wild" speech (each ~ 5.000 hours).
|
| 54 |
|
| 55 |
|
| 56 |
## Intended Use
|
|
|
|
| 62 |
* Explore multilingual and cross-cultural aspects of speech emotion (given the foundation dataset).
|
| 63 |
|
| 64 |
**Out-of-Scope Use:**
|
| 65 |
+
These models are trained on synthetic speech, and their generalization to spontaneous real-world speech needs further evaluation. They should not be used to make critical decisions about individuals, for surveillance, or in any manner that could lead to discriminatory outcomes or infringe on privacy without due diligence and thorough ethical review.
|
| 66 |
|
| 67 |
## How to Use
|
| 68 |
|
| 69 |
+
The primary way to use these models is through the provided [Google Colab Notebook](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2). The notebook handles dependencies, model loading, and audio processing, and provides examples for:
|
| 70 |
* Batch processing a folder of audio files.
|
| 71 |
* Generating a comprehensive HTML report with per-file emotion scores, waveforms, and audio players.
|
| 72 |
* Generating individual JSON files with all predicted scores for each audio file.
|
|
|
|
| 349 |
|
| 350 |
Privacy Preservation: The use of synthetic voice generation fundamentally circumvents privacy concerns associated with collecting real human emotional expressions, especially for sensitive states.
|
| 351 |
|
| 352 |
+
Responsible Use: These models are intended for research purposes. Users are urged to consider the ethical implications of their applications and avoid misuse, such as for emotional manipulation, surveillance, or in ways that could lead to unfair, biased, or harmful outcomes. The broader societal implications and mitigation of potential misuse of SER technology remain important ongoing considerations.
|