OpenWhistle CNN VGG16

OpenWhistleNeurIPS26/OpenWhistle-CNN-VGG16 is a supervised VGG16-based PyTorch classifier for bottlenose dolphin whistle detection.

The model is part of the OpenWhistle family and predicts whether a spectrogram window contains a whistle or noise.

Model Details

  • Model type: VGG16-based CNN classifier
  • Framework: PyTorch
  • Task: binary classification
  • Labels: whistle vs noise
  • Input: 224x224 RGB spectrogram
  • Checkpoint: model_vgg_final_best.pt
  • Best epoch: 4
  • Best validation loss: 0.1805

The model operates on spectrogram image windows rather than raw waveform audio.

Training and Evaluation Data

The model was trained and evaluated using a session-disjoint train/validation/test protocol.

Split summary:

  • Train: 53,828 windows across 195 sessions
  • Validation: 5,980 windows across 26 sessions
  • Test: 16,708 windows across 261 sessions

Each split is balanced between whistle and noise windows.

Test set composition:

  • 8,354 whistle windows
  • 8,354 matched noise windows

The model is intended for use with the OpenWhistle CNN/detection workflow and related bottlenose dolphin whistle detection datasets.

Intended Use

This model is intended as a supervised whistle detector for bottlenose dolphin acoustic recordings.

Potential uses include:

  • detecting whistle-like spectrogram windows
  • filtering long recordings before manual review
  • generating candidate whistle detections for downstream analysis
  • benchmarking whistle detection workflows on OpenWhistle-style spectrogram windows

This is a binary detector, not a whistle category classifier. It predicts whistle presence versus noise.

Metrics

Validation metrics:

  • Loss: 0.1805
  • Accuracy: 0.9460
  • F1: 0.9443
  • Precision: 0.9747
  • Recall: 0.9157

Test metrics:

  • Loss: 0.1409
  • Accuracy: 0.9723
  • F1: 0.9725
  • Precision: 0.9652
  • Recall: 0.9799

Confusion matrix counts are available in run_summary.json.

Input Format

The model expects:

  • spectrogram image input
  • RGB format
  • spatial size: 224x224
  • normalized tensor input matching the project inference pipeline

The checkpoint is designed to be loaded through the OpenWhistle/DolphinWhistleExtractor PyTorch codebase.

Loading

import torch

checkpoint_path = "model_vgg_final_best.pt"
checkpoint = torch.load(checkpoint_path, map_location="cpu")

Exact model reconstruction should use the VGG16 model definition from the OpenWhistle/DolphinWhistleExtractor codebase.

Implementation Notes

The VGG16 spectrogram-classification workflow was originally prototyped in a Keras/TensorFlow training script using ImageNet-pretrained VGG16 features. The released checkpoint is the PyTorch version of this workflow.

Evaluation metrics and reporting use standard Python scientific tooling, including scikit-learn for ROC/AUC, F1, precision, and recall.

Files

This repository contains:

  • model_vgg_final_best.pt
  • run_summary.json
  • validation_confusion_matrix.csv
  • test_confusion_matrix.csv
  • validation_session_metrics.csv
  • test_session_metrics.csv
  • training and ROC plots in figures/

Limitations

  • The model is specialized for bottlenose dolphin whistle detection on spectrogram windows.
  • Performance may change on other species, hydrophones, recording conditions, or spectrogram generation settings.
  • The model predicts whistle presence versus noise and does not classify whistle identity or whistle category.
  • Downstream ecological or behavioral interpretations should be validated independently.

License

The license for this model has not yet been specified. Please contact the model authors or maintainers before using it for redistribution or commercial purposes.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including OpenWhistleNeurIPS26/OpenWhistle-CNN-VGG16