File size: 12,666 Bytes
8d15d01 480122b 8d15d01 7930ec0 8d15d01 83f7e39 767528b 7930ec0 8d15d01 e02fe7c 8d15d01 e02fe7c 480122b e02fe7c 8d15d01 e02fe7c 8d15d01 e02fe7c 8d15d01 e02fe7c 8d15d01 e02fe7c 8d15d01 3bd3e4e 8d15d01 e02fe7c 8d15d01 e02fe7c 8d15d01 e02fe7c 8d15d01 e02fe7c 8d15d01 e02fe7c 8d15d01 e02fe7c 8d15d01 1f0b26b 8d15d01 1f0b26b 8d15d01 1f0b26b 8d15d01 c2289fa e02fe7c 8d15d01 e02fe7c 8d15d01 1f0b26b 8d15d01 1f0b26b 8d15d01 e02fe7c 8d15d01 4a6afc8 8d15d01 4a6afc8 e02fe7c e320205 8d15d01 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
---
license: mit
language:
- en
library_name: ultralytics
tags:
- biology
- CV
- images
- animals
- object-detection
- YOLO
- fine-tuned
- zebra
- giraffe
- onager
- dog
datasets:
- imageomics/mmla_mpala
- imageomics/mmla_wilds
- imageomics/mmla_opc
metrics:
- precision
- recall
- mAP50
- mAP50-95
base_model:
- Ultralytics/YOLO11
model_description: >-
This model is a fine-tuned version of YOLOv11m optimized for detection and
classification of wildlife from low-altitude drone imagery. It has been
trained to identify zebras (Plains and Grevy's), giraffes (reticulated and
Masai), Persian onagers, and African Painted dogs with high accuracy across
diverse environmental conditions.
---
# Model Card for Fine-Tuned YOLOv11m Animal Detection Model
This model is a fine-tuned version of [YOLO11m](https://docs.ultralytics.com/models/yolo11/) optimized for detection and classification of wildlife from low-altitude drone imagery.
It has been trained to identify zebras (Plains and Grevy's), giraffes (reticulated and Masai), Persian onagers, and African Painted dogs with high accuracy across diverse environmental conditions.
## Model Details
### Model Description
- **Developed by:** Jenna Kline
- **Model type:** Object Detection and Classification
- **Language(s) (NLP):** Not applicable (Computer Vision model)
- **Fine-tuned from model:** YOLOv11m (ultralytics/yolo11m.pt)
### Model Sources
- **Repository:** [https://github.com/Imageomics/mmla](https://github.com/Imageomics/mmla)
- **Paper:** [MMLA: Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset](https://arxiv.org/abs/2504.07744)
- **Training Dataset:** [MMLA Data Collection](https://huggingface.co/collections/imageomics/mmla)
-
## Uses
### Direct Use
This model is designed for direct use in wildlife monitoring applications, ecological research, and biodiversity studies. It can:
- Detect and classify zebras, giraffes, onagers, and African wild dogs in low-altitude drone images
- Monitor wildlife populations in their natural habitats
- Automate animal ecology data collection using drones and computer vision
- Support biodiversity assessments by identifying species present in field surveys
The model can be used by researchers, conservationists, wildlife managers, and citizen scientists to automate and scale up wildlife monitoring efforts, particularly in African ecosystems.
### Downstream Use
This model can be integrated into larger ecological monitoring systems including:
- Wildlife conservation monitoring platforms
- Ecological research workflows
- Environmental impact assessment tools
### Out-of-Scope Use
This model is not suitable for:
- Security or surveillance applications targeting humans
- Applications where errors in detection could lead to harmful conservation decisions without human verification
- Real-time detection systems requiring extremely low latency (model prioritizes accuracy over speed)
- Detection of species not included in the training set (only trained on zebras, giraffes, onagers, and dogs)
## Bias, Risks, and Limitations
- **Species representation bias:** The model may perform better on species that were well-represented in the training data.
- **Environmental bias:** Performance may degrade in environmental conditions not represented in the training data (e.g., extreme weather, unusual lighting).
- **Morphological bias:** Similar-looking species may be confused with one another (particularly among equids like zebras and onagers).
- **Geospatial bias:** The model may perform better in biomes similar to those present in the training data, particularly African savanna environments.
- **Seasonal bias:** Detection accuracy may vary based on seasonal appearance changes in animals or environments.
- **Technical limitations:** Performance depends on image quality, with reduced accuracy in low-resolution, blurry, or poorly exposed images.
### Recommendations
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model:
- Always verify critical detections with human review, especially for rare species or conservation decision-making
- Consider confidence scores when evaluating detections
- Be cautious when applying the model to new geographic regions or habitats not represented in training data
- Periodically validate model performance on new data to ensure continued reliability
- Consider fine-tuning the model on domain-specific data when applying to new regions or species
## How to Get Started with the Model
Use the code below to get started with the model:
```python
from ultralytics import YOLO
# Load the model
model = YOLO('path/to/your/model.pt')
# Run inference on an image
results = model('path/to/image.jpg')
# Process results
for result in results:
boxes = result.boxes # Boxes object for bounding boxes outputs
for box in boxes:
x1, y1, x2, y2 = box.xyxy[0] # get box coordinates
conf = box.conf[0] # confidence score
cls = int(box.cls[0]) # class id
class_name = model.names[cls] # class name (Zebra, Giraffe, Onager, or Dog)
print(f"Detected {class_name} with confidence {conf:.2f} at position {x1:.1f}, {y1:.1f}, {x2:.1f}, {y2:.1f}")
# Visualize results
results[0].plot()
```
## Training Details
### Training Data
The three datasets are available in the [MMLA Data Collection](https://huggingface.co/collections/imageomics/mmla). See `prepare_yolo_dataset.py` for details on train/test splits; the script runs on standard Python 3.10+ packages, and generates the splits.
#### Dataset splitting strategy
We applied a stratified 60/40 train-test split across species and locations to evaluate model generalizability. Data was collected from three distinct environments: Mpala Research Centre (location_1), Ol Pejeta Conservancy (location_2), and The Wilds Conservation Center (location_3). The dataset includes four target classes: Zebra, Giraffe, Onager, and African Wild Dog.
To prevent overlap in individual animals or environmental conditions between training and testing, we split video sessions at the file level—ensuring that no frames from a given session appear in both train and test sets. This also allows consistent per-frame sampling at a fixed interval (every 10th frame).
Training set includes:
- [Mpala](https://huggingface.co/datasets/imageomics/mmla_mpala) (location_1): Multiple full sessions for Giraffes, Plains Zebras, and Grevy’s Zebras, including mixed-species scenes.-
- [Ol Pejeta](https://huggingface.co/datasets/imageomics/mmla_opc) (location_2): Full sessions of Plains Zebras.
- [The Wilds](https://huggingface.co/datasets/imageomics/mmla_wilds) (location_3): 70% of sessions for Painted Dogs, Giraffes, and Persian Onagers.
Test set includes:
- [The Wilds](https://huggingface.co/datasets/imageomics/mmla_wilds) (location_3): The remaining 30% of sessions, including additional Grevy’s Zebra sessions used exclusively for testing.
- [Mpala](https://huggingface.co/datasets/imageomics/mmla_mpala) (location_1) and [Ol Pejeta](https://huggingface.co/datasets/imageomics/mmla_opc) (location_2): Separate zebra and mixed-species sessions not used during training.
This careful division by session and location ensures that the model is evaluated on unseen environments, individuals, and contexts, making it a robust benchmark for testing generalization across ecological and geographic domains.
### Training Procedure
#### Preprocessing
- Images were resized to 640x640 pixels (as specified in the training script)
- Standard YOLOv11 augmentation pipeline was applied
#### Training Hyperparameters
The model was trained with the following hyperparameters as specified in the training script:
- **Base model:** YOLOv11m (yolo11m.pt)
- **Epochs:** 50
- **Image size:** 640
- **Dataset configuration:** Custom YAML file defining 4 classes (Zebra, Giraffe, Onager, Dog)
- **Training regime:** Default YOLOv11 training parameters
```python
# Training script
from ultralytics import YOLO
model = YOLO("yolo11m.pt")
results = model.train(
data="/data/dataset.yaml",
epochs=50,
imgsz=640,
)
```
#### Speeds, Sizes, Times
- **Training hardware:** 2 Tesla V100-PCIE-16GB, 16144MiB
- **Training time:** 2 hours, 11 minutes
- **Model size:** YOLO11m summary - 231 layers, 20,056,092 parameters, 20,056,076 gradients, 68.2 GFLOPs
- **Inference speed:** 0.1ms preprocess, 4.6ms inference, 0.0ms loss, 0.9ms postprocess per image on Tesla V100-PCIE-16GB, 16144MiB
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
The model was evaluated on a held-out test set located at `images/test` (created by running the [data prep script](https://huggingface.co/imageomics/mmla/blob/main/prepare_yolo_dataset.py)) containing:
- 7658 test images with instances of Zebra, Giraffe, Onager, and Dog
#### Factors
The evaluation disaggregated performance by:
- Species category (Zebra, Giraffe, Onager, African wild dog)
#### Metrics
The model was evaluated using standard object detection metrics:
- **Precision:** Ratio of true positives to all predicted positives
- **Recall:** Ratio of true positives to all actual positives (ground truth)
- **mAP50:** Mean Average Precision at IoU threshold of 0.5
- **mAP50-95:** Mean Average Precision averaged over IoU thresholds from 0.5 to 0.95
### Results
#### Summary
- **Overall mAP50:** 80.1%
- **Overall mAP50-95:** 48.8%
- **Per-class performance:**
- Zebra: mAP50 = 67.5%, Precision = 76.5%, Recall = 64.7%
- Giraffe: mAP50 = 67.8%, Precision = 78.8%, Recall = 63.4%
- Onager: mAP50 = 85.7%, Precision = 93.9%, Recall = 77.6%
- Dog: mAP50 = 99.5%, Precision = 97.3%, Recall = 99.8%
## Technical Specifications
### Model Architecture and Objective
- Base architecture: YOLOv11m
- Detection heads: Standard YOLOv11 architecture
- Classes: 4 (Zebra, Giraffe, Onager, Dog)
### Compute Infrastructure
#### Software
- Python 3.8+
- PyTorch 2.0+
- Ultralytics YOLOv11 framework
- CUDA 11.7+ (for GPU acceleration)
## Citation
If you use this model in your work, please cite both it and our associated paper as described below.
**BibTeX:**
```
@software{mmla_finetuned_yolo11m,
author = {Jenna Kline},
title = {Fine-Tuned YOLOv11m Animal Detection Model (Revision 1f0b26b)},
version = {1.0.0},
year = {2025},
url = {https://huggingface.co/imageomics/mmla},
doi = {10.57967/hf/7327},
publisher = {Hugging Face}
}
@article{kline2025mmla,
title={MMLA: Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset},
author={Kline, Jenna and Stevens, Samuel and Maalouf, Guy and Saint-Jean, Camille Rondeau and Ngoc, Dat Nguyen and Mirmehdi, Majid and Guerin, David and Burghardt, Tilo and Pastucha, Elzbieta and Costelloe, Blair and others},
journal={arXiv preprint arXiv:2504.07744},
year={2025}
}
```
## Acknowledgements
This work was supported by both the [Imageomics Institute](https://imageomics.org) and the [AI and Biodiversity Change (ABC) Global Center](http://abcresearchcenter.org). The Imageomics Institute is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). The ABC Global Center is funded by the US National Science Foundation under [Award No. 2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false) and Natural Sciences and Engineering Research Council of Canada under [Award No. 585136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440). This model draws on research supported by the Social Sciences and Humanities Research Council.
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Natural Sciences and Engineering Research Council of Canada, or Social Sciences and Humanities Research Council.
## Glossary
- **YOLO:** You Only Look Once, a family of real-time object detection models
- **mAP:** mean Average Precision, a standard metric for evaluating object detection models
- **IoU:** Intersection over Union, a measure of overlap between predicted and ground truth bounding boxes
- **Onager:** Also known as the Asian wild ass, a species of equid native to Asia
- **YOLOv11m:** The medium-sized variant of the YOLOv11 architecture
## Model Card Authors
Jenna Kline, The Ohio State University
## Model Card Contact
kline.377 at osu dot edu |