File size: 7,042 Bytes
e33b847
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f24a160
e33b847
 
f24a160
 
 
e33b847
 
 
 
 
 
 
 
 
 
f24a160
e33b847
 
 
f24a160
 
 
e33b847
 
f24a160
 
 
 
 
 
14e07d7
f24a160
 
 
e33b847
 
f24a160
e33b847
 
 
f24a160
e33b847
 
 
f24a160
e33b847
 
 
f24a160
e33b847
 
 
 
 
 
 
f24a160
 
 
e33b847
 
 
f24a160
e33b847
 
 
f24a160
e33b847
 
 
 
 
2ef6c76
e33b847
f24a160
 
 
e33b847
 
 
f24a160
e33b847
 
 
f24a160
e33b847
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
license: mit
datasets:
- imageomics/BaboonLand
language:
- en
tags:
- biology
- CV
- images
- animals
- zebra
- giraffe
- behavior
- behavior recognition
- annotation
- UAV
- drone
- video
model_description: "Behavior recognition model for in situ drone videos of baboons, built using an X3D model. It was trained on the BaboonLand mini-scene dataset, which is comprised of 20 hours of aerial video footage of baboons captured using a DJI Mavic 2S drone."
---

# Model Card for x3d-BaboonLand

x3d-BaboonLand is a behavior recognition model for in situ drone videos of baboons, built using the X3D architecture. It was trained on the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) dataset, which includes both spatiotemporal clips (mini-scenes) and behavior annotations provided by an expert behavioral ecologist.

## Model Details

### Model Description

- **Developed by:** Isla Duporge, Maksim Kholiavchenko, Roi Harel, Scott Wolf, Daniel Rubenstein, Meg Crofoot, Tanya Berger-Wolf, Stephen Lee, Julie Barreau, Jenna Kline, Michelle Ramirez, Charles Stewart
- **Model type:** X3D-L
- **License:** MIT
- **Fine-tuned from model:** [X3D-L](https://github.com/facebookresearch/SlowFast/blob/main/configs/Kinetics/X3D_L.yaml)

This model was developed for the benefit of the community as an open-source product; we request that derivative products also remain open-source.

### Model Sources

- **Repository:** [kabr-tools](https://github.com/Imageomics/kabr-tools)
- **BaboonLand scripts:** [BaboonLand/scripts](https://huggingface.co/datasets/imageomics/BaboonLand/tree/main/BaboonLand/scripts)
- **Paper:** [BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos](https://link.springer.com/article/10.1007/s11263-025-02493-5)
- **Project Page:** [BaboonLand Project Page](https://baboonland.xyz)

### Data Processing Software

The [kabr-tools](https://github.com/Imageomics/kabr-tools) repository is the primary open-source package used as the basis for processing and formatting data for behavior-recognition workflows. For BaboonLand, we did **not** duplicate the full codebase into this model repository. Instead, we used the `kabr-tools` workflow with BaboonLand-specific inputs and lightweight script adaptations.

In particular, several scripts used for BaboonLand were derived from `kabr-tools` utilities, but were adapted for this dataset and renamed for clarity. The resulting BaboonLand-specific scripts are provided here:

[BaboonLand/scripts](https://huggingface.co/datasets/imageomics/BaboonLand/tree/main/BaboonLand/scripts)

These scripts document the dataset-specific preprocessing used for BaboonLand, while `kabr-tools` remains the main reference implementation for the broader workflow.

## Uses

This model is intended for baboon behavior recognition from in situ drone videos.

### Out-of-Scope Use

This model was trained to classify behavior from drone videos of baboons in Kenya. It may not perform well for other species, environments, camera viewpoints, annotation schemes, or behavior taxonomies.

## How to Get Started with the Model

Please see the illustrative examples in the [kabr-tools](https://imageomics.github.io/kabr-tools) for the general workflow.

## Training Details

We include the configuration file ([config.yaml](https://huggingface.co/imageomics/x3d-BaboonLand/blob/main/config.yaml)) used for X3D training in SlowFast.

### Training Data

This model was trained on the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) dataset.

#### Training Hyperparameters

The model was trained for 120 epochs using a batch size of 5.  
We used the EQL loss function to address the long-tailed class distribution and SGD optimization with a learning rate of `1e-5`.  
We used a sample rate of `16x5` and random weight initialization.

## Evaluation

The model was evaluated using the [SlowFast](https://github.com/facebookresearch/SlowFast) framework, specifically the [test_net.py](https://github.com/facebookresearch/SlowFast/blob/main/tools/test_net.py) evaluation script.

### Testing Data

We provide a train-test split of the mini-scenes from the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) dataset for evaluation, with 75% used for training and 25% for testing. No mini-scene was split across train and test partitions.

#### Metrics

We report Top-1, Top-3, and Top-5 macro-scores. For full details, please refer to the [paper](https://link.springer.com/article/10.1007/s11263-025-02493-5).

**Micro-Average (Per Instance) Scores**

| WI      | BS | Top-1 | Top-3 | Top-5 |
|---------|----|------:|------:|------:|
| Random  | 5  | 64.89 | 92.54 | 96.66 |

### Model Architecture and Objective

Please see the [base model description](https://arxiv.org/pdf/2004.04730).

#### Hardware

Running the X3D-L model requires a modern NVIDIA GPU with CUDA support. X3D-L is designed to be computationally efficient and typically requires 10–16 GB of GPU memory during training.

## Citation

**BibTeX:**

If you use our model in your work, please cite our paper.

**Paper**
```
@article{duporge2025baboonland,
  title={BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos},
  author={Duporge, Isla and Kholiavchenko, Maksim and Harel, Roi and Wolf, Scott and Rubenstein, Daniel I and Crofoot, Margaret C and Berger-Wolf, Tanya and Lee, Stephen J and Barreau, Julie and Kline, Jenna and Ramirez, Michelle and Stewart, Charles},
  journal={International Journal of Computer Vision},
  pages={1--12},
  year={2025},
  publisher={Springer}
}
```


## Acknowledgements

This work was supported by the [Imageomics Institute](https://imageomics.org), which is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Additional support was also provided by the [AI Institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE)](https://icicle.osu.edu/), which is funded by the US National Science Foundation under [Award #2112606](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2112606). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

The data was gathered at the [Mpala Research Centre](https://mpala.org/) in Kenya, in accordance with Research License No. NACOSTI/P/22/18214. The data collection protocol adhered strictly to the guidelines set forth by the Institutional Animal Care and Use Committee under permission No. IACUC 1835F.


## Model Card Authors

Maksim Kholiavchenko

## Model Card Contact

For questions on this model, please open a [discussion](https://huggingface.co/imageomics/x3d-BaboonLand/discussions) on this repo.