| # ECAPA Acoustic Domain Classifier | |
| ### Subtitle | |
| **Speech, Music, and Noise Classification Using ECAPA-TDNN Embeddings** | |
| --- | |
| ## π§ Overview | |
| This model classifies short audio clips into **Speech**, **Music**, or **Noise** domains. | |
| It uses **ECAPA-TDNN embeddings**, a neural architecture optimized for speaker and acoustic feature representation. | |
| Despite being trained on a **small, human-curated dataset (5 samples per class)**, the model demonstrates **high robustness and near-perfect classification**. | |
| This project serves as a **proof-of-concept** highlighting how ECAPA embeddings can generalize even in limited-data scenarios. | |
| --- | |
| ## π¦ Model Information | |
| - **Architecture:** ECAPA-TDNN | |
| - **Framework:** PyTorch (SpeechBrain-based) | |
| - **Input:** Mono audio waveform (16 kHz sampling rate) | |
| - **Output Classes:** Speech | Music | Noise | |
| - **Training Data:** 15 samples (5 per class), normalized and balanced | |
| - **Accuracy:** 100% on internal validation (small-scale) | |
| - **Author:** Khubaib Ahmad β AI/ML Engineer, Data Scientist | |
| --- | |
| ## βοΈ Methodology | |
| 1. Extract ECAPA-TDNN embeddings for all samples using SpeechBrain. | |
| 2. Train a simple classifier (e.g., linear or small dense network) on embeddings. | |
| 3. Validate predictions using held-out data. | |
| 4. Export trained model weights as `.pkl` file. | |
| --- | |
| ## π Usage Example | |
| ```python | |
| from speechbrain.pretrained import EncoderClassifier | |
| import torch | |
| # Load model | |
| model = torch.load("ECAPA_acoustic_domain_classifier.pkl", map_location="cpu") | |
| # Example inference (pseudo code) | |
| audio_tensor = load_audio("sample.wav") # your function to load audio as torch tensor | |
| embedding = model.encode_batch(audio_tensor) | |
| prediction = model.classify(embedding) | |
| print(prediction) # -> "speech", "music", or "noise" | |
| ``` | |
| --- | |
| ## π File Information | |
| | File | Description | | |
| |------|--------------| | |
| | `ECAPA_acoustic_domain_classifier.pkl` | Trained model weights | | |
| | `requirements.txt` | Dependencies for inference | | |
| | `README.md` | Model documentation | | |
| | `example_audio.mp3` | Sample audio file | | |
| --- | |
| ## π Applications | |
| - Acoustic scene classification | |
| - Pre-filtering for speech recognition pipelines | |
| - Smart audio event detection | |
| - Sound domain separation tasks | |
| --- | |
| ## π Suggested Citation | |
| ``` | |
| Muhammad Khubaib Ahmad (2025). ECAPA Acoustic Domain Classifier: Differentiating Speech, Music, and Noise using ECAPA-TDNN Embeddings. Hugging Face. | |
| ``` | |
| --- | |
| ## π§Ύ License | |
| MIT License β free for research and educational use. |