Object Detection
ultralytics
English
biology
CV
images
animals
YOLO
fine-tuned
zebra
giraffe
onager
dog

clarify dataset sources and add note about citation

#2
by egrace479 - opened
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -129,7 +129,7 @@ results[0].plot()
129
 
130
  ### Training Data
131
 
132
- Dataset is available at [Hugging Face](https://huggingface.co/collections/imageomics/wildwing-67f572d3ba17fca922c80182). See prepare_yolo_dataset.py for details on train/test splits.
133
 
134
  #### Dataset splitting strategy
135
  We applied a stratified 60/40 train-test split across species and locations to evaluate model generalizability. Data was collected from three distinct environments: Mpala Research Centre (location_1), Ol Pejeta Conservancy (location_2), and The Wilds Conservation Center (location_3). The dataset includes four target classes: Zebra, Giraffe, Onager, and African Wild Dog.
@@ -137,13 +137,13 @@ We applied a stratified 60/40 train-test split across species and locations to e
137
  To prevent overlap in individual animals or environmental conditions between training and testing, we split video sessions at the file level—ensuring that no frames from a given session appear in both train and test sets. This also allows consistent per-frame sampling at a fixed interval (every 10th frame).
138
 
139
  Training set includes:
140
- - Mpala (location_1): Multiple full sessions for Giraffes, Plains Zebras, and Grevy’s Zebras, including mixed-species scenes.-
141
- - Ol Pejeta (location_2): Full sessions of Plains Zebras.
142
- - The Wilds (location_3): 70% of sessions for Painted Dogs, Giraffes, and Persian Onagers.
143
 
144
  Test set includes:
145
- - The Wilds (location_3): The remaining 30% of sessions, including additional Grevy’s Zebra sessions used exclusively for testing.
146
- - Mpala (location_1) and Ol Pejeta (location_2): Separate zebra and mixed-species sessions not used during training.
147
 
148
  This careful division by session and location ensures that the model is evaluated on unseen environments, individuals, and contexts, making it a robust benchmark for testing generalization across ecological and geographic domains.
149
 
@@ -189,7 +189,7 @@ results = model.train(
189
 
190
  #### Testing Data
191
 
192
- The model was evaluated on a held-out test set located at `images/test` containing:
193
  - 7658 test images with instances of Zebra, Giraffe, Onager, and Dog
194
 
195
 
@@ -240,6 +240,8 @@ The model was evaluated using standard object detection metrics:
240
 
241
  ## Citation
242
 
 
 
243
  **BibTeX:**
244
 
245
  ```
 
129
 
130
  ### Training Data
131
 
132
+ The three datasets are available in the [MMLA Data Collection](https://huggingface.co/collections/imageomics/mmla). See `prepare_yolo_dataset.py` for details on train/test splits; the script runs on standard Python 3.10+ packages, and generates the splits.
133
 
134
  #### Dataset splitting strategy
135
  We applied a stratified 60/40 train-test split across species and locations to evaluate model generalizability. Data was collected from three distinct environments: Mpala Research Centre (location_1), Ol Pejeta Conservancy (location_2), and The Wilds Conservation Center (location_3). The dataset includes four target classes: Zebra, Giraffe, Onager, and African Wild Dog.
 
137
  To prevent overlap in individual animals or environmental conditions between training and testing, we split video sessions at the file level—ensuring that no frames from a given session appear in both train and test sets. This also allows consistent per-frame sampling at a fixed interval (every 10th frame).
138
 
139
  Training set includes:
140
+ - [Mpala](https://huggingface.co/datasets/imageomics/mmla_mpala) (location_1): Multiple full sessions for Giraffes, Plains Zebras, and Grevy’s Zebras, including mixed-species scenes.-
141
+ - [Ol Pejeta](https://huggingface.co/datasets/imageomics/mmla_opc) (location_2): Full sessions of Plains Zebras.
142
+ - [The Wilds](https://huggingface.co/datasets/imageomics/mmla_wilds) (location_3): 70% of sessions for Painted Dogs, Giraffes, and Persian Onagers.
143
 
144
  Test set includes:
145
+ - [The Wilds](https://huggingface.co/datasets/imageomics/mmla_wilds) (location_3): The remaining 30% of sessions, including additional Grevy’s Zebra sessions used exclusively for testing.
146
+ - [Mpala](https://huggingface.co/datasets/imageomics/mmla_mpala) (location_1) and [Ol Pejeta](https://huggingface.co/datasets/imageomics/mmla_opc) (location_2): Separate zebra and mixed-species sessions not used during training.
147
 
148
  This careful division by session and location ensures that the model is evaluated on unseen environments, individuals, and contexts, making it a robust benchmark for testing generalization across ecological and geographic domains.
149
 
 
189
 
190
  #### Testing Data
191
 
192
+ The model was evaluated on a held-out test set located at `images/test` (created by running the [data prep script](https://huggingface.co/imageomics/mmla/blob/main/prepare_yolo_dataset.py)) containing:
193
  - 7658 test images with instances of Zebra, Giraffe, Onager, and Dog
194
 
195
 
 
240
 
241
  ## Citation
242
 
243
+ If you use this model in your work, please cite both it and our associated paper as described below.
244
+
245
  **BibTeX:**
246
 
247
  ```