khadijafaisal commited on
Commit
deec0c4
·
verified ·
1 Parent(s): 5fe4e51

Mirror from Khubaib01/ECAPA-TDNN-VHE

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. ECAPA_TDNN_VHE.pth +3 -0
  3. README.md +178 -0
  4. radar_chart.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ radar_chart.png filter=lfs diff=lfs merge=lfs -text
ECAPA_TDNN_VHE.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:145325909e3e53c13bbb351537117727f4caf34828aea9c2e55b1d0f7262bfc6
3
+ size 9208363
README.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - speechbrain/spkrec-ecapa-voxceleb
7
+ tags:
8
+ - speaker-embedding
9
+ - vocal-fatigue
10
+ - voice-health
11
+ - ecapa-tdnn
12
+ - vhe
13
+ - pytorch
14
+ - auralis-vfs
15
+ - audio-processing
16
+ - voice-analysis
17
+ - research
18
+ ---
19
+
20
+ # ECAPA-TDNN-VHE: Vocal Health Encoder
21
+
22
+ ## Model Details
23
+
24
+ - **Model name:** ECAPA-TDNN-VHE
25
+ - **Author:** Muhammad Khubaib Ahmad et al.
26
+ - **License:** Apache 2.0
27
+ - **Framework:** PyTorch, SpeechBrain
28
+ - **Embedding dimensionality:** 192
29
+ - **Sampling rate:** 16 kHz (mono)
30
+ - **Task:** Health-centric vocal fatigue representation learning
31
+ - **Paper / Citation:**
32
+ Ahmad, M. K. (2026). *Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs*. Zenodo. https://doi.org/10.5281/zenodo.18366305
33
+
34
+ ---
35
+
36
+ ## Model Description
37
+
38
+ ECAPA-TDNN-VHE (Vocal Health Encoder) is a research-grade deep neural speech encoder developed in the research of Muhammad Khubaib Ahmad for generating health-centric, speaker-invariant vocal embeddings. Unlike conventional speaker embedding models optimized for identity discrimination, ECAPA-TDNN-VHE is trained from scratch using supervised contrastive learning, explicitly promoting separation between vocal health states while minimizing speaker-specific information.
39
+
40
+ Empirical evaluation demonstrates that ECAPA-TDNN-VHE **outperforms** the baseline ECAPA-TDNN by over **2.5×** in classification accuracy and F1-score on vocal health benchmarks, establishing it as a state-of-the-art model for health-oriented speech representation learning in ECAPA-TDNN based architectures.
41
+
42
+ The encoder forms the core of the **Auralis** MLOps framework and is accessible via the open-source Python library **auralis_vfs**, enabling reproducible and real-time vocal fatigue scoring for research and applied scenarios.
43
+
44
+ Key capabilities include:
45
+
46
+ - **192-dimensional embeddings** capturing health-relevant characteristics (strain, stress, fatigue).
47
+ - Continuous **vocal fatigue scoring** relative to a centroid of healthy embeddings (*fatigue axis*).
48
+ - Integration into **Auralis**, a robust MLOps system for real-time vocal fatigue monitoring.
49
+ - Accessible via the Python library [`auralis_vfs`](https://pypi.org/project/auralis-vfs/), enabling researchers to compute fatigue scores from audio files (`.wav`, `.mp3`, `.m4a`).
50
+
51
+ This model represents a **state-of-the-art (SOTA) approach for ECAPA-based health embeddings**, outperforming conventional ECAPA-TDNN trained for speaker recognition.
52
+
53
+ ---
54
+
55
+ ## Intended Use
56
+
57
+ ### Primary Use Cases
58
+ - Vocal fatigue monitoring for occupational voice users
59
+ - Health-centric speech embedding extraction
60
+ - Longitudinal voice health tracking
61
+ - Feature extraction for downstream clinical models
62
+ - Computational paralinguistics research
63
+
64
+ ### Out-of-Scope
65
+ - Speaker identification or verification
66
+ - Emotion recognition without retraining
67
+ - Medical diagnosis without professional oversight
68
+
69
+ ---
70
+
71
+ ## Training Data
72
+
73
+ - Real-world dataset: **~1.5 hours of speech from 70+ speakers**
74
+ - Labels: Healthy, Strained, Stressed
75
+ - Diverse microphones, devices, acoustic environments
76
+ - Gender-balanced, language-independent
77
+ - Preprocessed audio: **16 kHz, mono**, duration 5–10 seconds
78
+
79
+ ---
80
+
81
+ ## Training Procedure
82
+
83
+ - **Architecture:** ECAPA-TDNN
84
+ - **Training objective:** Supervised contrastive loss for health-state separability while minimizing speaker identity leakage
85
+ - **Embedding dimension:** 192
86
+ - **Optimizer:** Adam
87
+ - **Initialization:** Trained from scratch
88
+
89
+ ---
90
+
91
+ ## Evaluation
92
+
93
+ ### Benchmarking Against Baseline ECAPA-TDNN
94
+
95
+ The model was evaluated on vocal health classification tasks. Results highlight **ECAPA-TDNN-VHE's superiority over baseline ECAPA-TDNN**:
96
+
97
+ | Model | Accuracy | Macro F1 | Healthy F1 | Strained F1 | Stressed F1 |
98
+ |------|----------|----------|------------|-------------|-------------|
99
+ | ECAPA-TDNN (SpeechBrain baseline) | 0.36 | 0.31 | 0.50 | 0.22 | 0.22 |
100
+ | **ECAPA-TDNN-VHE (Khubaib et al., 2026)** | **0.78** | **0.77** | **0.85** | **0.78** | **0.70** |
101
+
102
+ This demonstrates **state-of-the-art health-centric embedding performance** within ECAPA-based architectures.
103
+
104
+ ---
105
+
106
+ ## 📊 Radar Chart: Embedding Quality Comparison
107
+
108
+ ![Radar_chart](radar_chart.png)
109
+
110
+ - Precision
111
+ - Recall
112
+ - F1-score
113
+ - Inter-class separation
114
+ - Intra-class compactness
115
+
116
+ > **Figure 1:** Radar chart comparing baseline ECAPA-TDNN and ECAPA-TDNN-VHE across classification and embedding quality metrics.
117
+
118
+ ---
119
+
120
+ ## 🏆 Leaderboard (Evaluated Models)
121
+
122
+ | Rank | Model | Accuracy | Macro F1 |
123
+ |------|-------|----------|----------|
124
+ | **1** | **ECAPA-TDNN-VHE (Muhammad Khubaib Ahmad et al., 2026)** | **0.78** | **0.77** |
125
+ | 2 | ECAPA-TDNN (SpeechBrain baseline) | 0.36 | 0.31 |
126
+
127
+ > Leaderboard reflects performance on the vocal health dataset and serves as a **research benchmark**, not a universal ranking.
128
+
129
+ ---
130
+
131
+ ## Inference
132
+
133
+ The model can be used via the Python library `auralis_vfs`:
134
+
135
+ ```bash
136
+ pip install auralis_vfs
137
+ ```
138
+
139
+ Example usage:
140
+ ```python
141
+ from auralis.scorer import score_audio, score_waveform
142
+
143
+ # Score from a waveform array
144
+ score = score_waveform(audio_array)
145
+
146
+ # Score from an audio file
147
+ score = score_audio("sample.wav")
148
+ print(f"Vocal fatigue score: {score:.2f}")
149
+ ```
150
+
151
+ The model is also deployed in the Auralis MLOps system, providing real-time fatigue monitoring and embedding-based analyses.
152
+
153
+ ## Citation
154
+
155
+ If you use this model in your research, please cite:
156
+
157
+ ```bibtex
158
+ @misc{muhammad_khubaib_ahmad_2026,
159
+ author = { Muhammad Khubaib Ahmad },
160
+ title = { ECAPA-TDNN-VHE (Revision 871292d) },
161
+ year = 2026,
162
+ url = { https://huggingface.co/Khubaib01/ECAPA-TDNN-VHE },
163
+ doi = { 10.57967/hf/7648 },
164
+ publisher = { Hugging Face }
165
+ }
166
+ ```
167
+
168
+ ## Future Work
169
+
170
+ - Integration of prosody features to enhance fatigue detection
171
+ - Automatic generation of clinical-style reports
172
+ - Expansion to larger, multi-lingual datasets
173
+ - Longitudinal tracking of speaker fatigue trends
174
+
175
+
176
+ ## Acknowledgments
177
+
178
+ The author gratefully acknowledge the participants for allowing us to use their voice in research and the author thank to the Data Manager(Faiez Ahmad) and Data collector(Muhammad Anas Tariq) for their incredible services and cooperation.
radar_chart.png ADDED

Git LFS Details

  • SHA256: 491faf9976045808a69719a801670e6e3daada6a22e7bf83d455f331a81ef538
  • Pointer size: 131 Bytes
  • Size of remote file: 165 kB