Update README.md

#2
by felfri - opened
Files changed (1) hide show
  1. README.md +10 -4
README.md CHANGED
@@ -1,7 +1,13 @@
1
  ---
2
  license: cc-by-4.0
 
 
 
 
 
3
  ---
4
 
 
5
  # Empathic-Insight-Voice-Small
6
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
7
 
@@ -44,7 +50,7 @@ This work is based on the research paper:
44
 
45
  The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.
46
 
47
- The models were trained on a large dataset of synthetic & "in the wild" speech (both each ~ 5.000 hours).
48
 
49
 
50
  ## Intended Use
@@ -56,11 +62,11 @@ These models are intended for research purposes in affective computing, speech e
56
  * Explore multilingual and cross-cultural aspects of speech emotion (given the foundation dataset).
57
 
58
  **Out-of-Scope Use:**
59
- These models are trained on synthetic speech and their generalization to spontaneous real-world speech needs further evaluation. They should not be used for making critical decisions about individuals, for surveillance, or in any manner that could lead to discriminatory outcomes or infringe on privacy without due diligence and ethical review.
60
 
61
  ## How to Use
62
 
63
- The primary way to use these models is through the provided [Google Colab Notebook](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2). The notebook handles dependencies, model loading, audio processing, and provides examples for:
64
  * Batch processing a folder of audio files.
65
  * Generating a comprehensive HTML report with per-file emotion scores, waveforms, and audio players.
66
  * Generating individual JSON files with all predicted scores for each audio file.
@@ -343,4 +349,4 @@ The EMONET-VOICE suite was developed with ethical considerations as a priority:
343
 
344
  Privacy Preservation: The use of synthetic voice generation fundamentally circumvents privacy concerns associated with collecting real human emotional expressions, especially for sensitive states.
345
 
346
- Responsible Use: These models are released for research. Users are urged to consider the ethical implications of their applications and avoid misuse, such as for emotional manipulation, surveillance, or in ways that could lead to unfair, biased, or harmful outcomes. The broader societal implications and mitigation of potential misuse of SER technology remain important ongoing considerations.
 
1
  ---
2
  license: cc-by-4.0
3
+ datasets:
4
+ - t1a5anu-anon/emonet-voice-foundation
5
+ base_model:
6
+ - mkrausio/EmoWhisper-AnS-Small-v0.1
7
+ pipeline_tag: audio-classification
8
  ---
9
 
10
+
11
  # Empathic-Insight-Voice-Small
12
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)
13
 
 
50
 
51
  The Empathic-Insight-Voice-Small suite consists of over 54 individual MLP models (40 for primary emotions, plus others for attributes like valence, arousal, gender, etc.). Each model takes a Whisper audio embedding as input and outputs a continuous score for one of the emotion/attribute categories defined in the EMONET-VOICE taxonomy and extended attribute set.
52
 
53
+ The models were trained on a large dataset of synthetic & "in the wild" speech (each ~ 5.000 hours).
54
 
55
 
56
  ## Intended Use
 
62
  * Explore multilingual and cross-cultural aspects of speech emotion (given the foundation dataset).
63
 
64
  **Out-of-Scope Use:**
65
+ These models are trained on synthetic speech, and their generalization to spontaneous real-world speech needs further evaluation. They should not be used to make critical decisions about individuals, for surveillance, or in any manner that could lead to discriminatory outcomes or infringe on privacy without due diligence and thorough ethical review.
66
 
67
  ## How to Use
68
 
69
+ The primary way to use these models is through the provided [Google Colab Notebook](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2). The notebook handles dependencies, model loading, and audio processing, and provides examples for:
70
  * Batch processing a folder of audio files.
71
  * Generating a comprehensive HTML report with per-file emotion scores, waveforms, and audio players.
72
  * Generating individual JSON files with all predicted scores for each audio file.
 
349
 
350
  Privacy Preservation: The use of synthetic voice generation fundamentally circumvents privacy concerns associated with collecting real human emotional expressions, especially for sensitive states.
351
 
352
+ Responsible Use: These models are intended for research purposes. Users are urged to consider the ethical implications of their applications and avoid misuse, such as for emotional manipulation, surveillance, or in ways that could lead to unfair, biased, or harmful outcomes. The broader societal implications and mitigation of potential misuse of SER technology remain important ongoing considerations.