Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Introduction
|
| 2 |
+
|
| 3 |
+
This repository contains eight WideResNet-101-2 models trained by the Dal (Dalhousie University) team for the FathomNet 2025 competition, predictions from these models achieved 3rd place.
|
| 4 |
+
These models are trained using distinct seeds and are intended to be used in the form of an ensemble.
|
| 5 |
+
Each of these models contains in its corresponding folder: the model checkpoint file (containing weights), the predictions on the competition test dataset, and recorded training information.
|
| 6 |
+
The overall process includes an iterative self-training pipeline, of which these models are the 21st iteration.
|
| 7 |
+
|
| 8 |
+
# Intended Use
|
| 9 |
+
|
| 10 |
+
The purpose of these models is to classify underwater imagery spanning the 79 leaf nodes in the FathomNet 2025.
|
| 11 |
+
Each independent model in this ensemble possesses 100 classification heads, which are all capable of making predictions on the data.
|
| 12 |
+
Confidence is then calculated based on the predicted probability distribution across these 100 heads, in an effort to capture epistemic uncertainty.
|
| 13 |
+
The ensemble prediction set is then generated by taking the mode of predictions across these eight component models, with ties broken by average confidence.
|
| 14 |
+
|
| 15 |
+
Further details on these models will be provided along with our GitHub code link when our report is finalized.
|
| 16 |
+
|
| 17 |
+
# Factors
|
| 18 |
+
|
| 19 |
+
Two main strategies appeared to be effective in our experimentation.
|
| 20 |
+
We used a hierarchical distance-weighted modified version of cross-entropy loss and combined this with a self-training process, by which future training iterations learned on confident pseudo-labels for the test data, driven by earlier generations of models.
|
| 21 |
+
|
| 22 |
+
# Metrics
|
| 23 |
+
|
| 24 |
+
While we employed accuracy internally, the evaluated metric is hierarchical distance (based on hops from the ground truth annotation in a hierarchical tree.
|
| 25 |
+
We implemented and used both in our experimentation.
|
| 26 |
+
Ensemble iteration 21 attained a public distance (competition public leaderboard) score of 2.27, and a private distance (competition evaluation leaderboard) score of 1.83.
|
| 27 |
+
|
| 28 |
+
# Training and Evaluation Data
|
| 29 |
+
|
| 30 |
+
We employed both mentioned metrics in tuning hyperparameters, using a randomly split validation dataset taken from the training subset of data (typically about 20% of training data).
|
| 31 |
+
Once we determined these optimal hyperparameters, we employed the full training dataset to make test predictions for submission.
|
| 32 |
+
Over time, with the self-training process, increasingly confident pseudo-labelled test samples would be incrementally added to the training dataset, for further generations of models.
|
| 33 |
+
Self-training performed in this fashion does not require ground truth for test annotations, and may be used for any downstream dataset of interest.
|
| 34 |
+
|
| 35 |
+
# Deployment
|
| 36 |
+
|
| 37 |
+
The models in question may be loaded and examined through PyTorch, and were implemented in a fairly standard approach using the library.
|
| 38 |
+
The recommended resize setting is 112 px, as this was what we used to train these models.
|
| 39 |
+
The normalization values are recommended to be the standard ImageNet settings, as these models employ Torchvision's pre-training on ImageNet.
|
| 40 |
+
|
| 41 |
+
Additional information and code will be released in the near future along with updates to this model card.
|