File size: 2,512 Bytes
ef8dd76 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# ECAPA Acoustic Domain Classifier
### Subtitle
**Speech, Music, and Noise Classification Using ECAPA-TDNN Embeddings**
---
## π§ Overview
This model classifies short audio clips into **Speech**, **Music**, or **Noise** domains.
It uses **ECAPA-TDNN embeddings**, a neural architecture optimized for speaker and acoustic feature representation.
Despite being trained on a **small, human-curated dataset (5 samples per class)**, the model demonstrates **high robustness and near-perfect classification**.
This project serves as a **proof-of-concept** highlighting how ECAPA embeddings can generalize even in limited-data scenarios.
---
## π¦ Model Information
- **Architecture:** ECAPA-TDNN
- **Framework:** PyTorch (SpeechBrain-based)
- **Input:** Mono audio waveform (16 kHz sampling rate)
- **Output Classes:** Speech | Music | Noise
- **Training Data:** 15 samples (5 per class), normalized and balanced
- **Accuracy:** 100% on internal validation (small-scale)
- **Author:** Khubaib Ahmad β AI/ML Engineer, Data Scientist
---
## βοΈ Methodology
1. Extract ECAPA-TDNN embeddings for all samples using SpeechBrain.
2. Train a simple classifier (e.g., linear or small dense network) on embeddings.
3. Validate predictions using held-out data.
4. Export trained model weights as `.pkl` file.
---
## π Usage Example
```python
from speechbrain.pretrained import EncoderClassifier
import torch
# Load model
model = torch.load("ECAPA_acoustic_domain_classifier.pkl", map_location="cpu")
# Example inference (pseudo code)
audio_tensor = load_audio("sample.wav") # your function to load audio as torch tensor
embedding = model.encode_batch(audio_tensor)
prediction = model.classify(embedding)
print(prediction) # -> "speech", "music", or "noise"
```
---
## π File Information
| File | Description |
|------|--------------|
| `ECAPA_acoustic_domain_classifier.pkl` | Trained model weights |
| `requirements.txt` | Dependencies for inference |
| `README.md` | Model documentation |
| `example_audio.mp3` | Sample audio file |
---
## π Applications
- Acoustic scene classification
- Pre-filtering for speech recognition pipelines
- Smart audio event detection
- Sound domain separation tasks
---
## π Suggested Citation
```
Muhammad Khubaib Ahmad (2025). ECAPA Acoustic Domain Classifier: Differentiating Speech, Music, and Noise using ECAPA-TDNN Embeddings. Hugging Face.
```
---
## π§Ύ License
MIT License β free for research and educational use. |