Bird Sound Identifier (EfficientNet-B0, Fold 3)
This repository contains a PyTorch model for bird sound classification, trained on the BirdCLEF dataset using an EfficientNet-B0 backbone. The model was trained for 30 epochs on fold 3 with strong data augmentation and class balancing.
Model Architecture
- Backbone: EfficientNet-B0 (from
timmlibrary) - Input: 1-channel Mel spectrograms (shape: [1, 128, T])
- First Conv Layer: Modified to accept 1 channel instead of 3
- Classifier: Dropout(0.5) + Linear layer (output = number of classes)
- Loss: CrossEntropyLoss with class weights
- Optimizer: Adam (lr=1e-3, weight_decay=1e-3)
- Scheduler: ReduceLROnPlateau (patience=3, factor=0.5)
- Augmentation: SpecAugment, Gaussian noise, mixup
- Early Stopping: patience=5 epochs
Training Results (Fold 3, 30 Epochs)
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|---|---|---|---|---|
| 1 | 5.51 | 0.014 | 32.70 | 0.033 |
| 2 | 5.26 | 0.077 | 19.76 | 0.133 |
| 3 | 4.51 | 0.202 | 4.20 | 0.211 |
| 4 | 3.70 | 0.281 | 3.72 | 0.294 |
| 5 | 3.25 | 0.332 | 3.53 | 0.299 |
| ... | ... | ... | ... | ... |
| 30 | 0.643 | 0.764 | 2.20 | 0.629 |
Final Metrics:
- Final Train Loss: 0.643
- Final Train Accuracy: 0.764
- Final Validation Loss: 2.20
- Final Validation Accuracy: 0.629
Analysis
- The model shows rapid improvement in both training and validation accuracy in the first 10 epochs, with validation accuracy rising from 3% to over 40%.
- After epoch 10, both losses decrease steadily and accuracies improve, with validation accuracy plateauing around 62-63%.
- The gap between train and validation accuracy remains reasonable, indicating no severe overfitting.
- The lowest validation loss and highest validation accuracy are achieved in the final epochs, suggesting stable convergence.
Training & Validation Curves
Usage
To load the model weights in PyTorch:
import torch
import timm
from pathlib import Path
# Load model architecture
def get_efficientnet_b0(num_classes):
model = timm.create_model('efficientnet_b0', pretrained=False)
in_ch = model.conv_stem.in_channels
if in_ch != 1:
model.conv_stem = torch.nn.Conv2d(1, model.conv_stem.out_channels, kernel_size=model.conv_stem.kernel_size,
stride=model.conv_stem.stride, padding=model.conv_stem.padding, bias=False)
in_features = model.classifier.in_features
model.classifier = torch.nn.Sequential(
torch.nn.Dropout(0.5),
torch.nn.Linear(in_features, num_classes)
)
return model
model = get_efficientnet_b0(num_classes=...)
model.load_state_dict(torch.load('best_efficientnetb0_fold3.pth', map_location='cpu'))
model.eval()
Citation
If you use this model, please cite both the BirdCLEF dataset and this repository:
BirdCLEF Dataset:
Kahl, S., Stöter, F.-R., Thakur, A., Klinck, H., Goëau, H., Glotin, H., Vellinga, W.-P., Planqué, R., Joly, A., & Lostanlen, V. (2023). BirdCLEF 2023: Bird sound recognition in complex acoustic environments. Proceedings of the Working Notes of CLEF 2023.
This Model:
Das, R. (2026). Bird Sound Identifier (EfficientNet-B0, Fold 3) [Model]. Hugging Face. https://huggingface.co/Rudraneel93/bird-sound-identifier
Author
For questions or issues, please open an issue on the Hugging Face Hub.
- Downloads last month
- -
