AnimalCLAP
Official model checkpoint for the ICASSP 2026 paper:
AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference
Risa Shinoda, Kaede Shiohara, Nakamasa Inoue, Hiroaki Santo, Fumio Okura
ICASSP 2026 [Paper]
Overview
AnimalCLAP is a CLAP-based audio-language model pretrained on 701,020 animal sound recordings from iNaturalist and Xeno-Canto. It uses taxonomy-aware text representations to learn rich audio embeddings for animal vocalizations.
Tasks:
- Zero-shot species classification (Top-1 / Top-5 / Top-10 accuracy)
- Trait inference (diet, habitat, locomotion, behavior, etc.)
Files
| File | Description |
|---|---|
animalclap_epoch020.pth |
AnimalCLAP encoder checkpoint (epoch 20) |
Usage
Code and inference scripts are available at dahlian00/AnimalCLAP.
Zero-shot species classification
python inference.py \
--ckpt /path/to/animalclap_epoch020.pth \
--test_csv /path/to/test.csv \
--data_dir /path/to/hf_dataset_root \
--ks 1 5 10
Trait prediction (after training a classifier head)
python inference_traits.py \
--ckpt /path/to/traits_checkpoint.pth \
--test_csv /path/to/test.csv \
--traits_csv /path/to/species_traits.csv \
--data_dir /path/to/hf_dataset_root \
--target_col diet_type \
--task_type multiclass
Dataset
The training dataset and species traits are available at risashinoda/animalclap-dataset.
Citation
@INPROCEEDINGS{shinodaanimalclap,
author={Shinoda, Risa and Shiohara, Kaede and Inoue, Nakamasa and Santo, Hiroaki and Okura, Fumio},
booktitle={ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference},
year={2026},
pages={7767-7771},
doi={10.1109/ICASSP55912.2026.11463001}
}