AnimalCLAP

Official model checkpoint for the ICASSP 2026 paper:

AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference
Risa Shinoda, Kaede Shiohara, Nakamasa Inoue, Hiroaki Santo, Fumio Okura
ICASSP 2026 [Paper]

Overview

AnimalCLAP is a CLAP-based audio-language model pretrained on 701,020 animal sound recordings from iNaturalist and Xeno-Canto. It uses taxonomy-aware text representations to learn rich audio embeddings for animal vocalizations.

Tasks:

Zero-shot species classification (Top-1 / Top-5 / Top-10 accuracy)
Trait inference (diet, habitat, locomotion, behavior, etc.)

Files

File	Description
`animalclap_epoch020.pth`	AnimalCLAP encoder checkpoint (epoch 20)

Usage

Code and inference scripts are available at dahlian00/AnimalCLAP.

Zero-shot species classification

python inference.py \
    --ckpt      /path/to/animalclap_epoch020.pth \
    --test_csv  /path/to/test.csv \
    --data_dir  /path/to/hf_dataset_root \
    --ks 1 5 10

Trait prediction (after training a classifier head)

python inference_traits.py \
    --ckpt       /path/to/traits_checkpoint.pth \
    --test_csv   /path/to/test.csv \
    --traits_csv /path/to/species_traits.csv \
    --data_dir   /path/to/hf_dataset_root \
    --target_col diet_type \
    --task_type  multiclass

Dataset

The training dataset and species traits are available at risashinoda/animalclap-dataset.

Citation

@INPROCEEDINGS{shinodaanimalclap,
  author={Shinoda, Risa and Shiohara, Kaede and Inoue, Nakamasa and Santo, Hiroaki and Okura, Fumio},
  booktitle={ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference},
  year={2026},
  pages={7767-7771},
  doi={10.1109/ICASSP55912.2026.11463001}
}

Downloads last month: -; Downloads are not tracked for this model. How to track