Bird Sound Identifier (EfficientNet-B0, Fold 3)

This repository contains a PyTorch model for bird sound classification, trained on the BirdCLEF dataset using an EfficientNet-B0 backbone. The model was trained for 30 epochs on fold 3 with strong data augmentation and class balancing.

Model Architecture

Backbone: EfficientNet-B0 (from timm library)
Input: 1-channel Mel spectrograms (shape: [1, 128, T])
First Conv Layer: Modified to accept 1 channel instead of 3
Classifier: Dropout(0.5) + Linear layer (output = number of classes)
Loss: CrossEntropyLoss with class weights
Optimizer: Adam (lr=1e-3, weight_decay=1e-3)
Scheduler: ReduceLROnPlateau (patience=3, factor=0.5)
Augmentation: SpecAugment, Gaussian noise, mixup
Early Stopping: patience=5 epochs

Training Results (Fold 3, 30 Epochs)

Epoch	Train Loss	Train Acc	Val Loss	Val Acc
1	5.51	0.014	32.70	0.033
2	5.26	0.077	19.76	0.133
3	4.51	0.202	4.20	0.211
4	3.70	0.281	3.72	0.294
5	3.25	0.332	3.53	0.299
...	...	...	...	...
30	0.643	0.764	2.20	0.629

Final Metrics:

Final Train Loss: 0.643
Final Train Accuracy: 0.764
Final Validation Loss: 2.20
Final Validation Accuracy: 0.629

Analysis

The model shows rapid improvement in both training and validation accuracy in the first 10 epochs, with validation accuracy rising from 3% to over 40%.
After epoch 10, both losses decrease steadily and accuracies improve, with validation accuracy plateauing around 62-63%.
The gap between train and validation accuracy remains reasonable, indicating no severe overfitting.
The lowest validation loss and highest validation accuracy are achieved in the final epochs, suggesting stable convergence.

Training & Validation Curves

Usage

To load the model weights in PyTorch:

import torch
import timm
from pathlib import Path

# Load model architecture
def get_efficientnet_b0(num_classes):
    model = timm.create_model('efficientnet_b0', pretrained=False)
    in_ch = model.conv_stem.in_channels
    if in_ch != 1:
        model.conv_stem = torch.nn.Conv2d(1, model.conv_stem.out_channels, kernel_size=model.conv_stem.kernel_size,
                                          stride=model.conv_stem.stride, padding=model.conv_stem.padding, bias=False)
    in_features = model.classifier.in_features
    model.classifier = torch.nn.Sequential(
        torch.nn.Dropout(0.5),
        torch.nn.Linear(in_features, num_classes)
    )
    return model

model = get_efficientnet_b0(num_classes=...)
model.load_state_dict(torch.load('best_efficientnetb0_fold3.pth', map_location='cpu'))
model.eval()

Citation

If you use this model, please cite both the BirdCLEF dataset and this repository:

BirdCLEF Dataset:

Kahl, S., Stöter, F.-R., Thakur, A., Klinck, H., Goëau, H., Glotin, H., Vellinga, W.-P., Planqué, R., Joly, A., & Lostanlen, V. (2023). BirdCLEF 2023: Bird sound recognition in complex acoustic environments. Proceedings of the Working Notes of CLEF 2023.

This Model:

Das, R. (2026). Bird Sound Identifier (EfficientNet-B0, Fold 3) [Model]. Hugging Face. https://huggingface.co/Rudraneel93/bird-sound-identifier

Author

Rudraneel Das

For questions or issues, please open an issue on the Hugging Face Hub.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support