| --- |
| license: mit |
| datasets: |
| - imageomics/BaboonLand |
| language: |
| - en |
| tags: |
| - biology |
| - CV |
| - images |
| - animals |
| - zebra |
| - giraffe |
| - behavior |
| - behavior recognition |
| - annotation |
| - UAV |
| - drone |
| - video |
| model_description: "Behavior recognition model for in situ drone videos of baboons, built using an X3D model. It was trained on the BaboonLand mini-scene dataset, which is comprised of 20 hours of aerial video footage of baboons captured using a DJI Mavic 2S drone." |
| --- |
| |
| # Model Card for x3d-BaboonLand |
|
|
| x3d-BaboonLand is a behavior recognition model for in situ drone videos of baboons, built using the X3D architecture. It was trained on the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) dataset, which includes both spatiotemporal clips (mini-scenes) and behavior annotations provided by an expert behavioral ecologist. |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| - **Developed by:** Isla Duporge, Maksim Kholiavchenko, Roi Harel, Scott Wolf, Daniel Rubenstein, Meg Crofoot, Tanya Berger-Wolf, Stephen Lee, Julie Barreau, Jenna Kline, Michelle Ramirez, Charles Stewart |
| - **Model type:** X3D-L |
| - **License:** MIT |
| - **Fine-tuned from model:** [X3D-L](https://github.com/facebookresearch/SlowFast/blob/main/configs/Kinetics/X3D_L.yaml) |
|
|
| This model was developed for the benefit of the community as an open-source product; we request that derivative products also remain open-source. |
|
|
| ### Model Sources |
|
|
| - **Repository:** [kabr-tools](https://github.com/Imageomics/kabr-tools) |
| - **BaboonLand scripts:** [BaboonLand/scripts](https://huggingface.co/datasets/imageomics/BaboonLand/tree/main/BaboonLand/scripts) |
| - **Paper:** [BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos](https://link.springer.com/article/10.1007/s11263-025-02493-5) |
| - **Project Page:** [BaboonLand Project Page](https://baboonland.xyz) |
|
|
| ### Data Processing Software |
|
|
| The [kabr-tools](https://github.com/Imageomics/kabr-tools) repository is the primary open-source package used as the basis for processing and formatting data for behavior-recognition workflows. For BaboonLand, we did **not** duplicate the full codebase into this model repository. Instead, we used the `kabr-tools` workflow with BaboonLand-specific inputs and lightweight script adaptations. |
|
|
| In particular, several scripts used for BaboonLand were derived from `kabr-tools` utilities, but were adapted for this dataset and renamed for clarity. The resulting BaboonLand-specific scripts are provided here: |
|
|
| [BaboonLand/scripts](https://huggingface.co/datasets/imageomics/BaboonLand/tree/main/BaboonLand/scripts) |
|
|
| These scripts document the dataset-specific preprocessing used for BaboonLand, while `kabr-tools` remains the main reference implementation for the broader workflow. |
|
|
| ## Uses |
|
|
| This model is intended for baboon behavior recognition from in situ drone videos. |
|
|
| ### Out-of-Scope Use |
|
|
| This model was trained to classify behavior from drone videos of baboons in Kenya. It may not perform well for other species, environments, camera viewpoints, annotation schemes, or behavior taxonomies. |
|
|
| ## How to Get Started with the Model |
|
|
| Please see the illustrative examples in the [kabr-tools](https://imageomics.github.io/kabr-tools) for the general workflow. |
|
|
| ## Training Details |
|
|
| We include the configuration file ([config.yaml](https://huggingface.co/imageomics/x3d-BaboonLand/blob/main/config.yaml)) used for X3D training in SlowFast. |
|
|
| ### Training Data |
|
|
| This model was trained on the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) dataset. |
|
|
| #### Training Hyperparameters |
|
|
| The model was trained for 120 epochs using a batch size of 5. |
| We used the EQL loss function to address the long-tailed class distribution and SGD optimization with a learning rate of `1e-5`. |
| We used a sample rate of `16x5` and random weight initialization. |
|
|
| ## Evaluation |
|
|
| The model was evaluated using the [SlowFast](https://github.com/facebookresearch/SlowFast) framework, specifically the [test_net.py](https://github.com/facebookresearch/SlowFast/blob/main/tools/test_net.py) evaluation script. |
|
|
| ### Testing Data |
|
|
| We provide a train-test split of the mini-scenes from the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) dataset for evaluation, with 75% used for training and 25% for testing. No mini-scene was split across train and test partitions. |
|
|
| #### Metrics |
|
|
| We report Top-1, Top-3, and Top-5 macro-scores. For full details, please refer to the [paper](https://link.springer.com/article/10.1007/s11263-025-02493-5). |
|
|
| **Micro-Average (Per Instance) Scores** |
|
|
| | WI | BS | Top-1 | Top-3 | Top-5 | |
| |---------|----|------:|------:|------:| |
| | Random | 5 | 64.89 | 92.54 | 96.66 | |
|
|
| ### Model Architecture and Objective |
|
|
| Please see the [base model description](https://arxiv.org/pdf/2004.04730). |
|
|
| #### Hardware |
|
|
| Running the X3D-L model requires a modern NVIDIA GPU with CUDA support. X3D-L is designed to be computationally efficient and typically requires 10–16 GB of GPU memory during training. |
|
|
| ## Citation |
|
|
| **BibTeX:** |
|
|
| If you use our model in your work, please cite our paper. |
|
|
| **Paper** |
| ``` |
| @article{duporge2025baboonland, |
| title={BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos}, |
| author={Duporge, Isla and Kholiavchenko, Maksim and Harel, Roi and Wolf, Scott and Rubenstein, Daniel I and Crofoot, Margaret C and Berger-Wolf, Tanya and Lee, Stephen J and Barreau, Julie and Kline, Jenna and Ramirez, Michelle and Stewart, Charles}, |
| journal={International Journal of Computer Vision}, |
| pages={1--12}, |
| year={2025}, |
| publisher={Springer} |
| } |
| ``` |
|
|
|
|
| ## Acknowledgements |
|
|
| This work was supported by the [Imageomics Institute](https://imageomics.org), which is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Additional support was also provided by the [AI Institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE)](https://icicle.osu.edu/), which is funded by the US National Science Foundation under [Award #2112606](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2112606). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. |
|
|
| The data was gathered at the [Mpala Research Centre](https://mpala.org/) in Kenya, in accordance with Research License No. NACOSTI/P/22/18214. The data collection protocol adhered strictly to the guidelines set forth by the Institutional Animal Care and Use Committee under permission No. IACUC 1835F. |
|
|
|
|
| ## Model Card Authors |
|
|
| Maksim Kholiavchenko |
|
|
| ## Model Card Contact |
|
|
| For questions on this model, please open a [discussion](https://huggingface.co/imageomics/x3d-BaboonLand/discussions) on this repo. |