imageomics
/

mmla

@@ -129,7 +129,7 @@ results[0].plot()
 ### Training Data
-Dataset is available at [Hugging Face](https://huggingface.co/collections/imageomics/wildwing-67f572d3ba17fca922c80182). See prepare_yolo_dataset.py for details on train/test splits.
 #### Dataset splitting strategy
 We applied a stratified 60/40 train-test split across species and locations to evaluate model generalizability. Data was collected from three distinct environments: Mpala Research Centre (location_1), Ol Pejeta Conservancy (location_2), and The Wilds Conservation Center (location_3). The dataset includes four target classes: Zebra, Giraffe, Onager, and African Wild Dog.
@@ -137,13 +137,13 @@ We applied a stratified 60/40 train-test split across species and locations to e
 To prevent overlap in individual animals or environmental conditions between training and testing, we split video sessions at the file level—ensuring that no frames from a given session appear in both train and test sets. This also allows consistent per-frame sampling at a fixed interval (every 10th frame).
 Training set includes:
- - Mpala (location_1): Multiple full sessions for Giraffes, Plains Zebras, and Grevy’s Zebras, including mixed-species scenes.-
- - Ol Pejeta (location_2): Full sessions of Plains Zebras.
-- The Wilds (location_3): 70% of sessions for Painted Dogs, Giraffes, and Persian Onagers.
 Test set includes:
-- The Wilds (location_3): The remaining 30% of sessions, including additional Grevy’s Zebra sessions used exclusively for testing.
-- Mpala (location_1) and Ol Pejeta (location_2): Separate zebra and mixed-species sessions not used during training.
 This careful division by session and location ensures that the model is evaluated on unseen environments, individuals, and contexts, making it a robust benchmark for testing generalization across ecological and geographic domains.
@@ -189,7 +189,7 @@ results = model.train(
 #### Testing Data
-The model was evaluated on a held-out test set located at `images/test` containing:
 - 7658 test images with instances of Zebra, Giraffe, Onager, and Dog
@@ -240,6 +240,8 @@ The model was evaluated using standard object detection metrics:
 ## Citation
 **BibTeX:**
 ```

 ### Training Data
+The three datasets are available in the [MMLA Data Collection](https://huggingface.co/collections/imageomics/mmla). See `prepare_yolo_dataset.py` for details on train/test splits; the script runs on standard Python 3.10+ packages, and generates the splits.
 #### Dataset splitting strategy
 We applied a stratified 60/40 train-test split across species and locations to evaluate model generalizability. Data was collected from three distinct environments: Mpala Research Centre (location_1), Ol Pejeta Conservancy (location_2), and The Wilds Conservation Center (location_3). The dataset includes four target classes: Zebra, Giraffe, Onager, and African Wild Dog.
 To prevent overlap in individual animals or environmental conditions between training and testing, we split video sessions at the file level—ensuring that no frames from a given session appear in both train and test sets. This also allows consistent per-frame sampling at a fixed interval (every 10th frame).
 Training set includes:
+ - [Mpala](https://huggingface.co/datasets/imageomics/mmla_mpala) (location_1): Multiple full sessions for Giraffes, Plains Zebras, and Grevy’s Zebras, including mixed-species scenes.-
+ - [Ol Pejeta](https://huggingface.co/datasets/imageomics/mmla_opc) (location_2): Full sessions of Plains Zebras.
+ - [The Wilds](https://huggingface.co/datasets/imageomics/mmla_wilds) (location_3): 70% of sessions for Painted Dogs, Giraffes, and Persian Onagers.
 Test set includes:
+- [The Wilds](https://huggingface.co/datasets/imageomics/mmla_wilds) (location_3): The remaining 30% of sessions, including additional Grevy’s Zebra sessions used exclusively for testing.
+- [Mpala](https://huggingface.co/datasets/imageomics/mmla_mpala) (location_1) and [Ol Pejeta](https://huggingface.co/datasets/imageomics/mmla_opc) (location_2): Separate zebra and mixed-species sessions not used during training.
 This careful division by session and location ensures that the model is evaluated on unseen environments, individuals, and contexts, making it a robust benchmark for testing generalization across ecological and geographic domains.
 #### Testing Data
+The model was evaluated on a held-out test set located at `images/test` (created by running the [data prep script](https://huggingface.co/imageomics/mmla/blob/main/prepare_yolo_dataset.py)) containing:
 - 7658 test images with instances of Zebra, Giraffe, Onager, and Dog
 ## Citation
+If you use this model in your work, please cite both it and our associated paper as described below.
 **BibTeX:**
 ```