Model Card for x3d-BaboonLand

x3d-BaboonLand is a behavior recognition model for in situ drone videos of baboons, built using the X3D architecture. It was trained on the BaboonLand dataset, which includes both spatiotemporal clips (mini-scenes) and behavior annotations provided by an expert behavioral ecologist.

Model Details

Model Description

  • Developed by: Isla Duporge, Maksim Kholiavchenko, Roi Harel, Scott Wolf, Daniel Rubenstein, Meg Crofoot, Tanya Berger-Wolf, Stephen Lee, Julie Barreau, Jenna Kline, Michelle Ramirez, Charles Stewart
  • Model type: X3D-L
  • License: MIT
  • Fine-tuned from model: X3D-L

This model was developed for the benefit of the community as an open-source product; we request that derivative products also remain open-source.

Model Sources

Data Processing Software

The kabr-tools repository is the primary open-source package used as the basis for processing and formatting data for behavior-recognition workflows. For BaboonLand, we did not duplicate the full codebase into this model repository. Instead, we used the kabr-tools workflow with BaboonLand-specific inputs and lightweight script adaptations.

In particular, several scripts used for BaboonLand were derived from kabr-tools utilities, but were adapted for this dataset and renamed for clarity. The resulting BaboonLand-specific scripts are provided here:

BaboonLand/scripts

These scripts document the dataset-specific preprocessing used for BaboonLand, while kabr-tools remains the main reference implementation for the broader workflow.

Uses

This model is intended for baboon behavior recognition from in situ drone videos.

Out-of-Scope Use

This model was trained to classify behavior from drone videos of baboons in Kenya. It may not perform well for other species, environments, camera viewpoints, annotation schemes, or behavior taxonomies.

How to Get Started with the Model

Please see the illustrative examples in the kabr-tools for the general workflow.

Training Details

We include the configuration file (config.yaml) used for X3D training in SlowFast.

Training Data

This model was trained on the BaboonLand dataset.

Training Hyperparameters

The model was trained for 120 epochs using a batch size of 5.
We used the EQL loss function to address the long-tailed class distribution and SGD optimization with a learning rate of 1e-5.
We used a sample rate of 16x5 and random weight initialization.

Evaluation

The model was evaluated using the SlowFast framework, specifically the test_net.py evaluation script.

Testing Data

We provide a train-test split of the mini-scenes from the BaboonLand dataset for evaluation, with 75% used for training and 25% for testing. No mini-scene was split across train and test partitions.

Metrics

We report Top-1, Top-3, and Top-5 macro-scores. For full details, please refer to the paper.

Micro-Average (Per Instance) Scores

WI BS Top-1 Top-3 Top-5
Random 5 64.89 92.54 96.66

Model Architecture and Objective

Please see the base model description.

Hardware

Running the X3D-L model requires a modern NVIDIA GPU with CUDA support. X3D-L is designed to be computationally efficient and typically requires 10–16 GB of GPU memory during training.

Citation

BibTeX:

If you use our model in your work, please cite our paper.

Paper

@article{duporge2025baboonland,
  title={BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos},
  author={Duporge, Isla and Kholiavchenko, Maksim and Harel, Roi and Wolf, Scott and Rubenstein, Daniel I and Crofoot, Margaret C and Berger-Wolf, Tanya and Lee, Stephen J and Barreau, Julie and Kline, Jenna and Ramirez, Michelle and Stewart, Charles},
  journal={International Journal of Computer Vision},
  pages={1--12},
  year={2025},
  publisher={Springer}
}

Acknowledgements

This work was supported by the Imageomics Institute, which is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under Award #2118240 (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Additional support was also provided by the AI Institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE), which is funded by the US National Science Foundation under Award #2112606. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

The data was gathered at the Mpala Research Centre in Kenya, in accordance with Research License No. NACOSTI/P/22/18214. The data collection protocol adhered strictly to the guidelines set forth by the Institutional Animal Care and Use Committee under permission No. IACUC 1835F.

Model Card Authors

Maksim Kholiavchenko

Model Card Contact

For questions on this model, please open a discussion on this repo.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train imageomics/x3d-BaboonLand

Collection including imageomics/x3d-BaboonLand

Paper for imageomics/x3d-BaboonLand