Bird Sound Identifier (EfficientNet-B0, Fold 3)

This repository contains a PyTorch model for bird sound classification, trained on the BirdCLEF dataset using an EfficientNet-B0 backbone. The model was trained for 30 epochs on fold 3 with strong data augmentation and class balancing.

Model Architecture

  • Backbone: EfficientNet-B0 (from timm library)
  • Input: 1-channel Mel spectrograms (shape: [1, 128, T])
  • First Conv Layer: Modified to accept 1 channel instead of 3
  • Classifier: Dropout(0.5) + Linear layer (output = number of classes)
  • Loss: CrossEntropyLoss with class weights
  • Optimizer: Adam (lr=1e-3, weight_decay=1e-3)
  • Scheduler: ReduceLROnPlateau (patience=3, factor=0.5)
  • Augmentation: SpecAugment, Gaussian noise, mixup
  • Early Stopping: patience=5 epochs

Training Results (Fold 3, 30 Epochs)

Epoch Train Loss Train Acc Val Loss Val Acc
1 5.51 0.014 32.70 0.033
2 5.26 0.077 19.76 0.133
3 4.51 0.202 4.20 0.211
4 3.70 0.281 3.72 0.294
5 3.25 0.332 3.53 0.299
... ... ... ... ...
30 0.643 0.764 2.20 0.629

Final Metrics:

  • Final Train Loss: 0.643
  • Final Train Accuracy: 0.764
  • Final Validation Loss: 2.20
  • Final Validation Accuracy: 0.629

Analysis

  • The model shows rapid improvement in both training and validation accuracy in the first 10 epochs, with validation accuracy rising from 3% to over 40%.
  • After epoch 10, both losses decrease steadily and accuracies improve, with validation accuracy plateauing around 62-63%.
  • The gap between train and validation accuracy remains reasonable, indicating no severe overfitting.
  • The lowest validation loss and highest validation accuracy are achieved in the final epochs, suggesting stable convergence.

Training & Validation Curves

Figure_1

Usage

To load the model weights in PyTorch:

import torch
import timm
from pathlib import Path

# Load model architecture
def get_efficientnet_b0(num_classes):
    model = timm.create_model('efficientnet_b0', pretrained=False)
    in_ch = model.conv_stem.in_channels
    if in_ch != 1:
        model.conv_stem = torch.nn.Conv2d(1, model.conv_stem.out_channels, kernel_size=model.conv_stem.kernel_size,
                                          stride=model.conv_stem.stride, padding=model.conv_stem.padding, bias=False)
    in_features = model.classifier.in_features
    model.classifier = torch.nn.Sequential(
        torch.nn.Dropout(0.5),
        torch.nn.Linear(in_features, num_classes)
    )
    return model

model = get_efficientnet_b0(num_classes=...)
model.load_state_dict(torch.load('best_efficientnetb0_fold3.pth', map_location='cpu'))
model.eval()

Citation

If you use this model, please cite both the BirdCLEF dataset and this repository:

BirdCLEF Dataset:

Kahl, S., Stöter, F.-R., Thakur, A., Klinck, H., Goëau, H., Glotin, H., Vellinga, W.-P., Planqué, R., Joly, A., & Lostanlen, V. (2023). BirdCLEF 2023: Bird sound recognition in complex acoustic environments. Proceedings of the Working Notes of CLEF 2023.

This Model:

Das, R. (2026). Bird Sound Identifier (EfficientNet-B0, Fold 3) [Model]. Hugging Face. https://huggingface.co/Rudraneel93/bird-sound-identifier

Author


For questions or issues, please open an issue on the Hugging Face Hub.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support