OpenWhistle CNN VGG16

OpenWhistleNeurIPS26/OpenWhistle-CNN-VGG16 is a supervised VGG16-based PyTorch classifier for bottlenose dolphin whistle detection.

The model is part of the OpenWhistle family and predicts whether a spectrogram window contains a whistle or noise.

Model Details

Model type: VGG16-based CNN classifier
Framework: PyTorch
Task: binary classification
Labels: whistle vs noise
Input: 224x224 RGB spectrogram
Checkpoint: model_vgg_final_best.pt
Best epoch: 4
Best validation loss: 0.1805

The model operates on spectrogram image windows rather than raw waveform audio.

Training and Evaluation Data

The model was trained and evaluated using a session-disjoint train/validation/test protocol.

Split summary:

Train: 53,828 windows across 195 sessions
Validation: 5,980 windows across 26 sessions
Test: 16,708 windows across 261 sessions

Each split is balanced between whistle and noise windows.

Test set composition:

8,354 whistle windows
8,354 matched noise windows

The model is intended for use with the OpenWhistle CNN/detection workflow and related bottlenose dolphin whistle detection datasets.

Intended Use

This model is intended as a supervised whistle detector for bottlenose dolphin acoustic recordings.

Potential uses include:

detecting whistle-like spectrogram windows
filtering long recordings before manual review
generating candidate whistle detections for downstream analysis
benchmarking whistle detection workflows on OpenWhistle-style spectrogram windows

This is a binary detector, not a whistle category classifier. It predicts whistle presence versus noise.

Metrics

Validation metrics:

Loss: 0.1805
Accuracy: 0.9460
F1: 0.9443
Precision: 0.9747
Recall: 0.9157

Test metrics:

Loss: 0.1409
Accuracy: 0.9723
F1: 0.9725
Precision: 0.9652
Recall: 0.9799

Confusion matrix counts are available in run_summary.json.

Input Format

The model expects:

spectrogram image input
RGB format
spatial size: 224x224
normalized tensor input matching the project inference pipeline

The checkpoint is designed to be loaded through the OpenWhistle/DolphinWhistleExtractor PyTorch codebase.

Loading

import torch

checkpoint_path = "model_vgg_final_best.pt"
checkpoint = torch.load(checkpoint_path, map_location="cpu")

Exact model reconstruction should use the VGG16 model definition from the OpenWhistle/DolphinWhistleExtractor codebase.

Implementation Notes

The VGG16 spectrogram-classification workflow was originally prototyped in a Keras/TensorFlow training script using ImageNet-pretrained VGG16 features. The released checkpoint is the PyTorch version of this workflow.

Evaluation metrics and reporting use standard Python scientific tooling, including scikit-learn for ROC/AUC, F1, precision, and recall.

Files

This repository contains:

model_vgg_final_best.pt
run_summary.json
validation_confusion_matrix.csv
test_confusion_matrix.csv
validation_session_metrics.csv
test_session_metrics.csv
training and ROC plots in figures/

Limitations

The model is specialized for bottlenose dolphin whistle detection on spectrogram windows.
Performance may change on other species, hydrophones, recording conditions, or spectrogram generation settings.
The model predicts whistle presence versus noise and does not classify whistle identity or whistle category.
Downstream ecological or behavioral interpretations should be validated independently.

License

The license for this model has not yet been specified. Please contact the model authors or maintainers before using it for redistribution or commercial purposes.

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including OpenWhistleNeurIPS26/OpenWhistle-CNN-VGG16

OpenWhistle - NeurIPS 26

Collection

6 items • Updated 10 days ago