OpenWhistle Wav2Vec2.0

OpenWhistleNeurIPS26/OpenWhistle-Wav2Vec2.0 is a Wav2Vec2-style self-supervised representation model pretrained on bottlenose dolphin whistle vocalizations.

The model is part of the OpenWhistle family and was pretrained on:

  • OpenWhistleNeurIPS26/OpenWhistle-Pretraining

It is intended as a pretrained audio backbone for downstream bottlenose dolphin bioacoustics tasks, including whistle classification and whistle detection.

Model Details

  • Model type: wav2vec2
  • Architecture: Wav2Vec2 pretraining model adapted to bottlenose dolphin acoustics
  • Hidden size: 768
  • Transformer layers: 12
  • Attention heads: 8
  • Feature extractor layers: 7
  • Sampling rate: 44,100 Hz
  • Weights: model.safetensors

Unlike standard speech Wav2Vec2 models trained at 16 kHz, this model uses frontend settings adapted for higher-frequency bottlenose dolphin whistles recorded at 44.1 kHz.

Training and Evaluation Data

The model was pretrained with self-supervised learning on:

  • OpenWhistleNeurIPS26/OpenWhistle-Pretraining

Downstream evaluation and fine-tuning are supported by the OpenWhistle supervised datasets:

  • OpenWhistleNeurIPS26/OpenWhistle-Classification-Finetuning
  • OpenWhistleNeurIPS26/OpenWhistle-Detection-Finetuning

The pretraining dataset contains unlabeled bottlenose dolphin whistle audio used to learn general acoustic representations. The classification and detection datasets provide supervised labels for downstream tasks, including whistle category classification and whistle detection.

Intended Use

This model is intended as a pretrained audio representation backbone for bottlenose dolphin bioacoustics research.

Potential downstream uses include:

  • fine-tuning on OpenWhistle-Classification-Finetuning for whistle classification
  • fine-tuning on OpenWhistle-Detection-Finetuning for whistle detection
  • extracting Wav2Vec2 embeddings from bottlenose dolphin whistles
  • exploratory analysis of learned acoustic representations

This is not a ready-to-use supervised classifier. For classification or detection, the model should be paired with a downstream head or fine-tuned on labeled data.

Input Audio

The preprocessor expects:

  • mono waveform input
  • sampling rate: 44,100 Hz
  • normalized audio
  • right padding
  • padding value: 0.0

Loading

from transformers import AutoFeatureExtractor, AutoModel

repo_id = "OpenWhistleNeurIPS26/OpenWhistle-Wav2Vec2.0"

processor = AutoFeatureExtractor.from_pretrained(repo_id)
model = AutoModel.from_pretrained(repo_id)

For pretraining-head usage or exact reproduction of training behavior, project-specific model code may be required.

Files

This repository contains the inference-oriented model files:

  • config.json
  • model.safetensors
  • preprocessor_config.json
  • trainer_state.json

Training-resume artifacts such as optimizer state, scheduler state, RNG state, and training_args.bin are not included in this repository.

Limitations

  • The model is specialized for bottlenose dolphin whistle acoustics and may not transfer directly to other species or recording domains.
  • It does not include behavioral, social, or environmental context.
  • Downstream performance depends on the labeled dataset, task definition, and fine-tuning protocol.
  • Learned representations may support exploratory scientific analysis, but biological interpretations should be validated independently.

License

The license for this model has not yet been specified. Please contact the model authors or maintainers before using it for redistribution or commercial purposes.

Downloads last month
27
Safetensors
Model size
95.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including OpenWhistleNeurIPS26/OpenWhistle-Wav2Vec2.0