imageomics
/

mmla

@@ -1,272 +1,127 @@
----
-license: mit
-language:
-- en
-library_name: ultralytics
-tags:
-- biology
-- CV
-- images
-- animals
-- object-detection
-- YOLO
-- fine-tuned
-datasets:
-- custom_animal_dataset
-metrics:
-- precision
-- recall
-- mAP50
-- mAP50-95
----
-# Model Card for Fine-Tuned YOLOv11m Animal Detection Model
-This model is a fine-tuned version of YOLOv11m optimized for detection and classification of wildlife from low-altitude drone imagery. It has been trained to identify zebras (Plains and Grevy's), giraffes, Persian onagers, and African Painted dogs with high accuracy across diverse environmental conditions.
-## Model Details
-### Model Description
-- **Developed by:** Jenna Kline
-- **Model type:** Object Detection and Classification
-- **Language(s) (NLP):** Not applicable (Computer Vision model)
-- **Fine-tuned from model:** YOLOv11m (ultralytics/yolo11m.pt)
-### Model Sources
-- **Repository:** https://github.com/Imageomics-ABC-edu/final-project-kenyan-ungulates-with-wilddroneeu
-- **Paper:** [MMLA](https://arxiv.org/abs/2504.07744)
-## Uses
-### Direct Use
-This model is designed for direct use in wildlife monitoring applications, ecological research, and biodiversity studies. It can:
-- Detect and classify zebras, giraffes, onagers, and dogs in camera trap images
-- Monitor wildlife populations in their natural habitats
-- Assist researchers in automated processing of large image datasets
-- Support biodiversity assessments by identifying species present in field surveys
-The model can be used by researchers, conservationists, wildlife managers, and citizen scientists to automate and scale up wildlife monitoring efforts, particularly in African ecosystems.
-### Downstream Use
-This model can be integrated into larger ecological monitoring systems including:
-- Automated camera trap processing pipelines
-- Wildlife conservation monitoring platforms
-- Ecological research workflows
-- Citizen science applications for species identification
-- Environmental impact assessment tools
-### Out-of-Scope Use
-This model is not suitable for:
-- Medical diagnosis or human-related detection tasks
-- Security or surveillance applications targeting humans
-- Applications where errors in detection could lead to harmful conservation decisions without human verification
-- Real-time detection systems requiring extremely low latency (model prioritizes accuracy over speed)
-- Detection of species not included in the training set (only trained on zebras, giraffes, onagers, and dogs)
-## Bias, Risks, and Limitations
-- **Species representation bias:** The model may perform better on species that were well-represented in the training data.
-- **Environmental bias:** Performance may degrade in environmental conditions not represented in the training data (e.g., extreme weather, unusual lighting).
-- **Morphological bias:** Similar-looking species may be confused with one another (particularly among equids like zebras and onagers).
-- **Geospatial bias:** The model may perform better in biomes similar to those present in the training data, particularly African savanna environments.
-- **Seasonal bias:** Detection accuracy may vary based on seasonal appearance changes in animals or environments.
-- **Technical limitations:** Performance depends on image quality, with reduced accuracy in low-resolution, blurry, or poorly exposed images.
-### Recommendations
-Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model:
-- Always verify critical detections with human review, especially for rare species or conservation decision-making
-- Consider confidence scores when evaluating detections
-- Be cautious when applying the model to new geographic regions or habitats not represented in training data
-- Periodically validate model performance on new data to ensure continued reliability
-- Consider fine-tuning the model on domain-specific data when applying to new regions or species
-## How to Get Started with the Model
-Use the code below to get started with the model:
-```python
-from ultralytics import YOLO
-# Load the model
-model = YOLO('path/to/your/model.pt')
-# Run inference on an image
-results = model('path/to/image.jpg')
-# Process results
-for result in results:
-    boxes = result.boxes  # Boxes object for bounding boxes outputs
-    for box in boxes:
-        x1, y1, x2, y2 = box.xyxy[0]  # get box coordinates
-        conf = box.conf[0]  # confidence score
-        cls = int(box.cls[0])  # class id
-        class_name = model.names[cls]  # class name (Zebra, Giraffe, Onager, or Dog)
-        print(f"Detected {class_name} with confidence {conf:.2f} at position {x1:.1f}, {y1:.1f}, {x2:.1f}, {y2:.1f}")
-# Visualize results
-results[0].plot()
 ```
-## Training Details
-### Training Data
-Dataset is available at [Hugging Face](https://huggingface.co/collections/imageomics/wildwing-67f572d3ba17fca922c80182). See /data/dataset.yaml for details on train/val/test splits.
-### Training Procedure
-#### Preprocessing
-- Images were resized to 640x640 pixels (as specified in the training script)
-- Standard YOLOv11 augmentation pipeline was applied
-#### Training Hyperparameters
-The model was trained with the following hyperparameters as specified in the training script:
-- **Base model:** YOLOv11m (yolo11m.pt)
-- **Epochs:** 50
-- **Image size:** 640
-- **Dataset configuration:** Custom YAML file defining 4 classes (Zebra, Giraffe, Onager, Dog)
-- **Training regime:** Default YOLOv11 training parameters
-```python
-# Training script
-from ultralytics import YOLO
-model = YOLO("yolo11m.pt")
-results = model.train(
-    data="/data/dataset.yaml",
-    epochs=50,
-    imgsz=640,
-)
 ```
-#### Speeds, Sizes, Times
-- **Training hardware:** [Your GPU/CPU specifications]
-- **Training time:** [Duration]
-- **Model size:** [Size in MB]
-- **Inference speed:** [FPS on specific hardware]
 ## Evaluation
-### Testing Data, Factors & Metrics
-#### Testing Data
-The model was evaluated on a held-out test set located at `/fs/ess/PAS2136/Kenya-2023/yolo_benchmark/HerdYOLO/data/images/test` containing:
-- [Number] test images with instances of Zebra, Giraffe, Onager, and Dog
-- [Any other relevant testing data details]
-#### Factors
-The evaluation disaggregated performance by:
-- Species (Zebra, Giraffe, Onager, African wild dog)
-#### Metrics
-The model was evaluated using standard object detection metrics:
-- **Precision:** Ratio of true positives to all predicted positives
-- **Recall:** Ratio of true positives to all actual positives (ground truth)
-- **mAP50:** Mean Average Precision at IoU threshold of 0.5
-- **mAP50-95:** Mean Average Precision averaged over IoU thresholds from 0.5 to 0.95
-### Results
-#### Summary
-- **Overall mAP50:** [Value]
-- **Overall mAP50-95:** [Value]
-- **Per-class performance:**
-  - Zebra: mAP50 = [Value], Precision = [Value], Recall = [Value]
-  - Giraffe: mAP50 = [Value], Precision = [Value], Recall = [Value]
-  - Onager: mAP50 = [Value], Precision = [Value], Recall = [Value]
-  - Dog: mAP50 = [Value], Precision = [Value], Recall = [Value]
-## Model Examination
-- **Confusion analysis:** [Any notable confusion between classes, such as between Zebra and Onager]
-- **Failure cases:** [Specific conditions where the model performs less reliably]
-- **Interpretability findings:** [Any insights from model interpretation techniques]
-## Environmental Impact
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://doi.org/10.48550/arXiv.1910.09700).
-- **Hardware Type:** [GPU model]
-- **Hours used:** [Number]
-- **Cloud Provider:** [Provider name or local]
-- **Compute Region:** [Region]
-- **Carbon Emitted:** [Amount] kg CO₂eq
-## Technical Specifications
-### Model Architecture and Objective
-- Base architecture: YOLOv11m
-- Detection heads: Standard YOLOv11 architecture
-- Classes: 4 (Zebra, Giraffe, Onager, Dog)
-### Compute Infrastructure
-#### Hardware
-- **Training:** [GPU/CPU details]
-- **Inference:** Tested on [range of devices]
-- **Minimum requirements:** [Specifications]
-#### Software
-- Python 3.8+
-- PyTorch 2.0+
-- Ultralytics YOLOv11 framework
-- CUDA 11.7+ (for GPU acceleration)
-## Citation
-**BibTeX:**
 ```
-@software{mmla_finetuned_yolo11m,
-  author = {Jenna Kline},
-  title = {Fine-Tuned YOLOv11m Animal Detection Model},
-  version = {1.0.0},
-  year = {2025},
-  url = {https://huggingface.co/imageomics/mmla}
 }
-```
-## Acknowledgements
-This work was supported by both the [Imageomics Institute](https://imageomics.org) and the [AI and Biodiversity Change (ABC) Global Center](http://abcresearchcenter.org). The Imageomics Institute is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). The ABC Global Center is funded by the US National Science Foundation under [Award No. 2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false) and Natural Sciences and Engineering Research Council of Canada under [Award No. 585136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440). This model draws on research supported by the Social Sciences and Humanities Research Council.
-Additional support was provided by the National Ecological Observatory Network (NEON), a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle Memorial Institute. This material is based in part upon work supported by the National Science Foundation through the NEON Program.
-Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Natural Sciences and Engineering Research Council of Canada, or Social Sciences and Humanities Research Council.
-## Glossary
-- **YOLO:** You Only Look Once, a family of real-time object detection models
-- **mAP:** mean Average Precision, a standard metric for evaluating object detection models
-- **IoU:** Intersection over Union, a measure of overlap between predicted and ground truth bounding boxes
-- **Onager:** Also known as the Asian wild ass, a species of equid native to Asia
-- **YOLOv11m:** The medium-sized variant of the YOLOv11 architecture
-## More Information
-[Any additional information you'd like to include]
-## Model Card Authors
-Jenna Kline, The Ohio State University
-## Model Card Contact
-kline.377 at osu dot edu

+# MMLA Repo
+Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset
+![zebras_giraffes](vizualizations/location_1_session_5_DJI_0211_partition_1_DJI_0211_002590.jpg)
+Example photo from the MMLA dataset and labels generated from model. The image shows a group of zebras and giraffes at the Mpala Research Centre in Kenya.
+## Table of Contents
+  - [How to use the scripts in this repo](#how-to-use-the-scripts-in-this-repo)
+    - [Requirements](#requirements)
+  - [Baseline YOLO evaluation](#baseline-yolo-evaluation)
+    - [Download evaluation data from HuggingFace](#download-evaluation-data-from-huggingface)
+    - [Run the evaluate_yolo script](#run-the-evaluate_yolo-script)
+  - [Model Training](#model-training)
+    - [Prepare the dataset](#prepare-the-dataset)
+    - [Optional: Downsample the frames](#optional-downsample-the-frames)
+    - [Run the training script](#run-the-training-script)
+  - [Evaluation](#evaluation)
+    - [Optional: Perform bootstrapping](#optional-perform-bootstrapping)
+  - [Results](#results)
+  - [Fine-Tuned Model Weights](#fine-tuned-model)
+  - [Paper](#paper)
+  - [Dataset](#dataset)
+This repo provides scripts to fine-tune YOLO models on the MMLA dataset. The [MMLA dataset](https://huggingface.co/collections/imageomics/wildwing-67f572d3ba17fca922c80182) is a collection of low-altitude aerial footage of various species in different environments. The dataset is designed to help researchers and practitioners develop and evaluate object detection models for wildlife monitoring and conservation.
+# How to use the scripts in this repo
+### Requirements
+```bash
+# install packages from requirements
+conda create --name yolo_env --file requirements.txt
+# OR using pip
+pip install -r requirements.txt
+```
+## Baseline YOLO evaluation
+### Download evaluation data from HuggingFace
+This dataset contains an evenly distributed set of frames from the MMLA dataset, with bounding box annotations for each frame. The dataset is designed to help researchers and practitioners evaluate the performance of object detection models on low-altitude aerial footage containing a variety of environments and species.
+```bash
+# download the datasets from HuggingFace to local /data directory
+git clone
 ```
+### Run the evaluate_yolo script
+```bash
+# example usage
+python model_eval/evaluate_yolo.py --model model_eval/yolov5mu.pt  --images model_eval/eval_data/frames_500_coco --annotations model_eval/eval_data/frames_500_coco --output model_eval/results/frames_500_coco/yolov5m
+```
+## Model Training
+### Prepare the dataset
+```bash
+# download the datasets from HuggingFace to local /data directory
+# wilds dataset
+git clone https://huggingface.co/datasets/imageomics/wildwing_wilds
+# opc dataset
+git clone https://huggingface.co/datasets/imageomics/wildwing_opc
+# mpala dataset
+git clone https://huggingface.co/datasets/imageomics/wildwing_mpala
+# run the script to split the dataset into train and test sets
+python prepare_yolo_dataset.py
+```
+#### Alternatively, you can create your own dataset from video frames and bounding box annotations
+```bash
+python frame_extractor.py --dataset wilds --dataset_path ./wildwing_wilds --output_dir ./wildwing_wilds
+```
+### Optional: Downsample the frames to extract a subset of frames from each video
+```bash
+python downsample.py --dataset wilds --dataset_path ./wildwing_wilds --output_dir ./wildwing_wilds --downsample_rate 0.1
 ```
+### Run the training script
+```bash
+# run the training script
+python train.py
+```
 ## Evaluation
+To evaluate the trained model on the test data:
+```bash
+# run the validate script
+python validate.py
+```
+### Optional: Perform bootstrapping to get confidence intervals
+```bash
+# run the evaluation script
+bootstrap.ipynb
 ```
+#### Download inference results from baseline and fine-tned model
+## Results
+Our fine-tuned YOLO11m model achieves the following performance on the MMLA dataset:
+| Class   | Images | Instances | Box(P) | R     | mAP50 | mAP50-95 |
+|---------|--------|-----------|--------|-------|-------|----------|
+| all     | 7,658  | 44,619    | 0.867  | 0.764 | 0.801 | 0.488    |
+| Zebra   | 4,430  | 28,219    | 0.768  | 0.647 | 0.675 | 0.273    |
+| Giraffe | 868    | 1,357     | 0.788  | 0.634 | 0.678 | 0.314    |
+| Onager  | 172    | 1,584     | 0.939  | 0.776 | 0.857 | 0.505    |
+| Dog     | 3,022  | 13,459    | 0.973  | 0.998 | 0.995 | 0.860    |
+# Fine-Tuned Model
+See [HuggingFace Repo](https://huggingface.co/imageomics/mmla) for details and weights.
+# Dataset
+See [HuggingFace Repo](https://huggingface.co/collections/imageomics/wildwing-67f572d3ba17fca922c80182) for MMLA dataset.
+# Paper
+```bibtex
+@article{kline2025mmla,
+  title={MMLA: Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset},
+  author={Kline, Jenna and Stevens, Samuel and Maalouf, Guy and Saint-Jean, Camille Rondeau and Ngoc, Dat Nguyen and Mirmehdi, Majid and Guerin, David and Burghardt, Tilo and Pastucha, Elzbieta and Costelloe, Blair and others},
+  journal={arXiv preprint arXiv:2504.07744},
+  year={2025}
 }
+```