OpenWhistle CNN VGG16
OpenWhistleNeurIPS26/OpenWhistle-CNN-VGG16 is a supervised VGG16-based PyTorch classifier for bottlenose dolphin whistle detection.
The model is part of the OpenWhistle family and predicts whether a spectrogram window contains a whistle or noise.
Model Details
- Model type: VGG16-based CNN classifier
- Framework: PyTorch
- Task: binary classification
- Labels:
whistlevsnoise - Input:
224x224RGB spectrogram - Checkpoint:
model_vgg_final_best.pt - Best epoch: 4
- Best validation loss: 0.1805
The model operates on spectrogram image windows rather than raw waveform audio.
Training and Evaluation Data
The model was trained and evaluated using a session-disjoint train/validation/test protocol.
Split summary:
- Train: 53,828 windows across 195 sessions
- Validation: 5,980 windows across 26 sessions
- Test: 16,708 windows across 261 sessions
Each split is balanced between whistle and noise windows.
Test set composition:
- 8,354 whistle windows
- 8,354 matched noise windows
The model is intended for use with the OpenWhistle CNN/detection workflow and related bottlenose dolphin whistle detection datasets.
Intended Use
This model is intended as a supervised whistle detector for bottlenose dolphin acoustic recordings.
Potential uses include:
- detecting whistle-like spectrogram windows
- filtering long recordings before manual review
- generating candidate whistle detections for downstream analysis
- benchmarking whistle detection workflows on OpenWhistle-style spectrogram windows
This is a binary detector, not a whistle category classifier. It predicts whistle presence versus noise.
Metrics
Validation metrics:
- Loss: 0.1805
- Accuracy: 0.9460
- F1: 0.9443
- Precision: 0.9747
- Recall: 0.9157
Test metrics:
- Loss: 0.1409
- Accuracy: 0.9723
- F1: 0.9725
- Precision: 0.9652
- Recall: 0.9799
Confusion matrix counts are available in run_summary.json.
Input Format
The model expects:
- spectrogram image input
- RGB format
- spatial size:
224x224 - normalized tensor input matching the project inference pipeline
The checkpoint is designed to be loaded through the OpenWhistle/DolphinWhistleExtractor PyTorch codebase.
Loading
import torch
checkpoint_path = "model_vgg_final_best.pt"
checkpoint = torch.load(checkpoint_path, map_location="cpu")
Exact model reconstruction should use the VGG16 model definition from the OpenWhistle/DolphinWhistleExtractor codebase.
Implementation Notes
The VGG16 spectrogram-classification workflow was originally prototyped in a Keras/TensorFlow training script using ImageNet-pretrained VGG16 features. The released checkpoint is the PyTorch version of this workflow.
Evaluation metrics and reporting use standard Python scientific tooling, including scikit-learn for ROC/AUC, F1, precision, and recall.
Files
This repository contains:
model_vgg_final_best.ptrun_summary.jsonvalidation_confusion_matrix.csvtest_confusion_matrix.csvvalidation_session_metrics.csvtest_session_metrics.csv- training and ROC plots in
figures/
Limitations
- The model is specialized for bottlenose dolphin whistle detection on spectrogram windows.
- Performance may change on other species, hydrophones, recording conditions, or spectrogram generation settings.
- The model predicts whistle presence versus noise and does not classify whistle identity or whistle category.
- Downstream ecological or behavioral interpretations should be validated independently.
License
The license for this model has not yet been specified. Please contact the model authors or maintainers before using it for redistribution or commercial purposes.