OpenWhistle Wav2Vec2.0

OpenWhistleNeurIPS26/OpenWhistle-Wav2Vec2.0 is a Wav2Vec2-style self-supervised representation model pretrained on bottlenose dolphin whistle vocalizations.

The model is part of the OpenWhistle family and was pretrained on:

OpenWhistleNeurIPS26/OpenWhistle-Pretraining

It is intended as a pretrained audio backbone for downstream bottlenose dolphin bioacoustics tasks, including whistle classification and whistle detection.

Model Details

Model type: wav2vec2
Architecture: Wav2Vec2 pretraining model adapted to bottlenose dolphin acoustics
Hidden size: 768
Transformer layers: 12
Attention heads: 8
Feature extractor layers: 7
Sampling rate: 44,100 Hz
Weights: model.safetensors

Unlike standard speech Wav2Vec2 models trained at 16 kHz, this model uses frontend settings adapted for higher-frequency bottlenose dolphin whistles recorded at 44.1 kHz.

Training and Evaluation Data

The model was pretrained with self-supervised learning on:

OpenWhistleNeurIPS26/OpenWhistle-Pretraining

Downstream evaluation and fine-tuning are supported by the OpenWhistle supervised datasets:

OpenWhistleNeurIPS26/OpenWhistle-Classification-Finetuning
OpenWhistleNeurIPS26/OpenWhistle-Detection-Finetuning

The pretraining dataset contains unlabeled bottlenose dolphin whistle audio used to learn general acoustic representations. The classification and detection datasets provide supervised labels for downstream tasks, including whistle category classification and whistle detection.

Intended Use

This model is intended as a pretrained audio representation backbone for bottlenose dolphin bioacoustics research.

Potential downstream uses include:

fine-tuning on OpenWhistle-Classification-Finetuning for whistle classification
fine-tuning on OpenWhistle-Detection-Finetuning for whistle detection
extracting Wav2Vec2 embeddings from bottlenose dolphin whistles
exploratory analysis of learned acoustic representations

This is not a ready-to-use supervised classifier. For classification or detection, the model should be paired with a downstream head or fine-tuned on labeled data.

Input Audio

The preprocessor expects:

mono waveform input
sampling rate: 44,100 Hz
normalized audio
right padding
padding value: 0.0

Loading

from transformers import AutoFeatureExtractor, AutoModel

repo_id = "OpenWhistleNeurIPS26/OpenWhistle-Wav2Vec2.0"

processor = AutoFeatureExtractor.from_pretrained(repo_id)
model = AutoModel.from_pretrained(repo_id)

For pretraining-head usage or exact reproduction of training behavior, project-specific model code may be required.

Files

This repository contains the inference-oriented model files:

config.json
model.safetensors
preprocessor_config.json
trainer_state.json

Training-resume artifacts such as optimizer state, scheduler state, RNG state, and training_args.bin are not included in this repository.

Limitations

The model is specialized for bottlenose dolphin whistle acoustics and may not transfer directly to other species or recording domains.
It does not include behavioral, social, or environmental context.
Downstream performance depends on the labeled dataset, task definition, and fine-tuning protocol.
Learned representations may support exploratory scientific analysis, but biological interpretations should be validated independently.

License

The license for this model has not yet been specified. Please contact the model authors or maintainers before using it for redistribution or commercial purposes.

Downloads last month: 27

Safetensors

Model size

95.1M params

Tensor type

F32

Collection including OpenWhistleNeurIPS26/OpenWhistle-Wav2Vec2.0

OpenWhistle - NeurIPS 26

Collection

6 items • Updated 9 days ago