OpenWhistle Wav2Vec2.0
OpenWhistleNeurIPS26/OpenWhistle-Wav2Vec2.0 is a Wav2Vec2-style self-supervised representation model pretrained on bottlenose dolphin whistle vocalizations.
The model is part of the OpenWhistle family and was pretrained on:
OpenWhistleNeurIPS26/OpenWhistle-Pretraining
It is intended as a pretrained audio backbone for downstream bottlenose dolphin bioacoustics tasks, including whistle classification and whistle detection.
Model Details
- Model type:
wav2vec2 - Architecture: Wav2Vec2 pretraining model adapted to bottlenose dolphin acoustics
- Hidden size: 768
- Transformer layers: 12
- Attention heads: 8
- Feature extractor layers: 7
- Sampling rate: 44,100 Hz
- Weights:
model.safetensors
Unlike standard speech Wav2Vec2 models trained at 16 kHz, this model uses frontend settings adapted for higher-frequency bottlenose dolphin whistles recorded at 44.1 kHz.
Training and Evaluation Data
The model was pretrained with self-supervised learning on:
OpenWhistleNeurIPS26/OpenWhistle-Pretraining
Downstream evaluation and fine-tuning are supported by the OpenWhistle supervised datasets:
OpenWhistleNeurIPS26/OpenWhistle-Classification-FinetuningOpenWhistleNeurIPS26/OpenWhistle-Detection-Finetuning
The pretraining dataset contains unlabeled bottlenose dolphin whistle audio used to learn general acoustic representations. The classification and detection datasets provide supervised labels for downstream tasks, including whistle category classification and whistle detection.
Intended Use
This model is intended as a pretrained audio representation backbone for bottlenose dolphin bioacoustics research.
Potential downstream uses include:
- fine-tuning on
OpenWhistle-Classification-Finetuningfor whistle classification - fine-tuning on
OpenWhistle-Detection-Finetuningfor whistle detection - extracting Wav2Vec2 embeddings from bottlenose dolphin whistles
- exploratory analysis of learned acoustic representations
This is not a ready-to-use supervised classifier. For classification or detection, the model should be paired with a downstream head or fine-tuned on labeled data.
Input Audio
The preprocessor expects:
- mono waveform input
- sampling rate: 44,100 Hz
- normalized audio
- right padding
- padding value:
0.0
Loading
from transformers import AutoFeatureExtractor, AutoModel
repo_id = "OpenWhistleNeurIPS26/OpenWhistle-Wav2Vec2.0"
processor = AutoFeatureExtractor.from_pretrained(repo_id)
model = AutoModel.from_pretrained(repo_id)
For pretraining-head usage or exact reproduction of training behavior, project-specific model code may be required.
Files
This repository contains the inference-oriented model files:
config.jsonmodel.safetensorspreprocessor_config.jsontrainer_state.json
Training-resume artifacts such as optimizer state, scheduler state, RNG state, and training_args.bin are not included in this repository.
Limitations
- The model is specialized for bottlenose dolphin whistle acoustics and may not transfer directly to other species or recording domains.
- It does not include behavioral, social, or environmental context.
- Downstream performance depends on the labeled dataset, task definition, and fine-tuning protocol.
- Learned representations may support exploratory scientific analysis, but biological interpretations should be validated independently.
License
The license for this model has not yet been specified. Please contact the model authors or maintainers before using it for redistribution or commercial purposes.
- Downloads last month
- 27