Spaces:
Sleeping
Sleeping
metadata
title: ESC50 Audio Classifier
emoji: π
colorFrom: gray
colorTo: purple
sdk: docker
pinned: false
Environmental Sound Classification (ESC50) with Deep CNNs
A PyTorch reimplementation of the deep convolutional neural network approach from Salamon & Bello (2017) for environmental sound classification, extended to handle 50 classes instead of the original 10.
Overview
This project implements a deep CNN architecture for environmental sound classification using log-mel spectrograms as input features. The implementation follows the methodology described in the paper "Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification" but is scaled to work with a more challenging 50-class classification task.
Key Features
- Deep CNN with 3 convolutional layers + 2 fully connected layers
- Log-mel spectrogram feature extraction using Essentia
- Data augmentation (time stretching, pitch shifting, dynamic range compression)
- Overlapping patch prediction for validation (1-frame hop)
Results
| Dataset | Classes | Accuracy (with augmentation) |
|---|---|---|
| UrbanSound8K (paper) | 10 | 79% |
| This project | 50 | 74% |
Architecture
Model Structure
Input: Log-mel Spectrogram (128 Γ 128)
β
Conv2D(1β24, 5Γ5) + ReLU + MaxPool(4Γ2)
β
Conv2D(24β48, 5Γ5) + ReLU + MaxPool(4Γ2)
β
Conv2D(48β48, 5Γ5) + ReLU
β
Flatten β Dense(2400β64) + ReLU + Dropout(0.5)
β
Dense(64β50) + Softmax
β
Output: 50 classes
Training Configuration
- Optimizer: SGD with momentum (0.9)
- Learning Rate: 0.01
- Batch Size: 100 TF-patches
- L2 Regularization: 0.001 (on classifier layers only)
- Dropout: 0.5 (on classifier layers)
- Epochs: 100