InsectNet for the Biodiversity Exploratories

Updated model that tags audio files as belonging to one or more of 29 prevalent Orthoptera species within the Biodiversity Exploratories.

Installation

To use the model, you have to install autrainer, e.g. via pip:

pip install autrainer

This model has been trained and tested with autrainer version 0.6.0. For more information about autrainer, please refer to: https://autrainer.github.io/autrainer/index.html

Usage

The model can be applied on all wav files present in a folder (<data-root>) and stored in another folder (<output-root>):

autrainer inference hf:HearTheSpecies/InsectNet-BE-AN -r <data-root> <output-root> -w 4 -s 4 -sr 96000

, where -w is the window size in seconds, -s is the step size in seconds and -sr is the sampling rate. For other possible inference settings and all usable parameters, please have a look at the autrainer documentation. However, the above settings are recommended.

Caution! Specifying the sampling rate as 96000 is necessary in order to ensure proper windowed inference, as our model was trained with this sampling rate.

Training

Pretraining

We used the CNN10 architecture, a pretrained audio neural network (PANN), introduced by Kong et al. (2020), which was pretrained on AudioSet.

Dataset

The training data comprised the following data sources:

Strong labels:

  • ECOSoundSet (Funosas et al., 2026): Orthoptera species as well as buzz clips.
  • Additional annotations by the University of Freiburg and Simon Thorne: Orthoptera species.
  • WABAD (Pérez-Granados et al., 2026): for bird recordings.
  • Edansa-2019 (Çoban, E. B., et al., 2022): for silence recordings

Weak labels:

  • InsectSet459 (Faiß et al., 2026): Orthoptera species
  • Additional xeno-canto recordings which were not included in InsectSet459: Orthoptera species

The comprised classes are listed within target_transform.yaml as labels.

Features

The audio recordings were resampled to 96kHz, as we wanted to avoid losing too much frequency information from the species. Log-Mel spectrograms were then extracted using torchlibrosa.

Training process

The model has been trained for 30 epochs. At the end of each epoch, the model was evaluated on our validation set. We release the state that achieved the best performance on this validation set. All training hyperparameters can be found inside conf/config.yaml inside the model folder. The input audios to the model were 4s long. If necessary shorter/longer audio files have been padded/cropped.

Evaluation

The performance on the test set reached a (macro) f1-score of XXX.

Acknowledgments

Please acknowledge the work which produced the original model. We would appreciate an acknowledgment to autrainer.

Bibliography

  • Rampp, S., et al. (2024). Autrainer: A modular and extensible deep learning toolkit for computer audition tasks. arXiv preprint arXiv:2412.11943.
  • Kong, Q., et al. (2020). Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2880-2894.
  • Funosas, D., et al. (2026). A finely annotated dataset for the automated acoustic identification of European Orthoptera and Cicadidae. Scientific Data.
  • Pérez‐Granados, C., et al. (2026). WABAD: A world annotated bird acoustic dataset for passive acoustic monitoring. Ecology, 107(2), e70317.
  • Faiß, M., et al. (2026). A dataset of insect sounds from 459 species for bioacoustic machine learning. Scientific Data.
  • Çoban, E. B., et al. (2022). Edansa-2019: The ecoacoustic dataset from arctic north slope alaska. In Workshop on the detection and classification of acoustic scenes and events.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for HearTheSpecies/InsectNet-BE-AN