ECAPA_Acoustic_Domain_Classifier / ECAPA_Acoustic_Domain_Classifier_README.md
Khubaib01's picture
initial commit of model, readme.md, sample_audio, requirements
ef8dd76 verified
# ECAPA Acoustic Domain Classifier
### Subtitle
**Speech, Music, and Noise Classification Using ECAPA-TDNN Embeddings**
---
## 🧠 Overview
This model classifies short audio clips into **Speech**, **Music**, or **Noise** domains.
It uses **ECAPA-TDNN embeddings**, a neural architecture optimized for speaker and acoustic feature representation.
Despite being trained on a **small, human-curated dataset (5 samples per class)**, the model demonstrates **high robustness and near-perfect classification**.
This project serves as a **proof-of-concept** highlighting how ECAPA embeddings can generalize even in limited-data scenarios.
---
## πŸ“¦ Model Information
- **Architecture:** ECAPA-TDNN
- **Framework:** PyTorch (SpeechBrain-based)
- **Input:** Mono audio waveform (16 kHz sampling rate)
- **Output Classes:** Speech | Music | Noise
- **Training Data:** 15 samples (5 per class), normalized and balanced
- **Accuracy:** 100% on internal validation (small-scale)
- **Author:** Khubaib Ahmad β€” AI/ML Engineer, Data Scientist
---
## βš™οΈ Methodology
1. Extract ECAPA-TDNN embeddings for all samples using SpeechBrain.
2. Train a simple classifier (e.g., linear or small dense network) on embeddings.
3. Validate predictions using held-out data.
4. Export trained model weights as `.pkl` file.
---
## πŸš€ Usage Example
```python
from speechbrain.pretrained import EncoderClassifier
import torch
# Load model
model = torch.load("ECAPA_acoustic_domain_classifier.pkl", map_location="cpu")
# Example inference (pseudo code)
audio_tensor = load_audio("sample.wav") # your function to load audio as torch tensor
embedding = model.encode_batch(audio_tensor)
prediction = model.classify(embedding)
print(prediction) # -> "speech", "music", or "noise"
```
---
## πŸ“‚ File Information
| File | Description |
|------|--------------|
| `ECAPA_acoustic_domain_classifier.pkl` | Trained model weights |
| `requirements.txt` | Dependencies for inference |
| `README.md` | Model documentation |
| `example_audio.mp3` | Sample audio file |
---
## πŸ“Š Applications
- Acoustic scene classification
- Pre-filtering for speech recognition pipelines
- Smart audio event detection
- Sound domain separation tasks
---
## πŸ”– Suggested Citation
```
Muhammad Khubaib Ahmad (2025). ECAPA Acoustic Domain Classifier: Differentiating Speech, Music, and Noise using ECAPA-TDNN Embeddings. Hugging Face.
```
---
## 🧾 License
MIT License β€” free for research and educational use.