File size: 2,512 Bytes
ef8dd76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# ECAPA Acoustic Domain Classifier

### Subtitle
**Speech, Music, and Noise Classification Using ECAPA-TDNN Embeddings**

---   

## 🧠 Overview
This model classifies short audio clips into **Speech**, **Music**, or **Noise** domains.  
It uses **ECAPA-TDNN embeddings**, a neural architecture optimized for speaker and acoustic feature representation.

Despite being trained on a **small, human-curated dataset (5 samples per class)**, the model demonstrates **high robustness and near-perfect classification**.  
This project serves as a **proof-of-concept** highlighting how ECAPA embeddings can generalize even in limited-data scenarios.

---

## πŸ“¦ Model Information

- **Architecture:** ECAPA-TDNN
- **Framework:** PyTorch (SpeechBrain-based)
- **Input:** Mono audio waveform (16 kHz sampling rate)
- **Output Classes:** Speech | Music | Noise
- **Training Data:** 15 samples (5 per class), normalized and balanced
- **Accuracy:** 100% on internal validation (small-scale)
- **Author:** Khubaib Ahmad β€” AI/ML Engineer, Data Scientist

---

## βš™οΈ Methodology

1. Extract ECAPA-TDNN embeddings for all samples using SpeechBrain.  
2. Train a simple classifier (e.g., linear or small dense network) on embeddings.  
3. Validate predictions using held-out data.  
4. Export trained model weights as `.pkl` file.  

---

## πŸš€ Usage Example

```python
from speechbrain.pretrained import EncoderClassifier
import torch

# Load model
model = torch.load("ECAPA_acoustic_domain_classifier.pkl", map_location="cpu")

# Example inference (pseudo code)
audio_tensor = load_audio("sample.wav")  # your function to load audio as torch tensor
embedding = model.encode_batch(audio_tensor)
prediction = model.classify(embedding)
print(prediction)  # -> "speech", "music", or "noise"
```

---

## πŸ“‚ File Information

| File | Description |
|------|--------------|
| `ECAPA_acoustic_domain_classifier.pkl` | Trained model weights |
| `requirements.txt` | Dependencies for inference |
| `README.md` | Model documentation |
| `example_audio.mp3` | Sample audio file |

---

## πŸ“Š Applications

- Acoustic scene classification  
- Pre-filtering for speech recognition pipelines  
- Smart audio event detection  
- Sound domain separation tasks

---

## πŸ”– Suggested Citation

```
Muhammad Khubaib Ahmad (2025). ECAPA Acoustic Domain Classifier: Differentiating Speech, Music, and Noise using ECAPA-TDNN Embeddings. Hugging Face.
```

---

## 🧾 License
MIT License β€” free for research and educational use.