--- license: mit datasets: - imageomics/fish-vista tags: - swin - swin-transformer - fish - fish-classification --- # Swin Transformer for Fish Classification ![image](https://cdn-uploads.huggingface.co/production/uploads/6661f6eeb8e23e099edebafb/UBYR3QtjrTBFTiap0WGSM.png) This swin-tiny transformer was trained to classify fish species from the fish-visa dataset. The dataset contains massive class imbalance. While I tried to mitigate the effect of this imbalance, the model has varying degrees of accuracy depending on the class frequency. Therefore, I recommend to use this model weights as a baseline for building a classification or segmentation model on a fish dataset, as it would not be suitable for any task out-of-the-box. ## Usage Load and use the model : ```python import torch from PIL import Image from transformers import AutoImageProcessor from transformers import AutoModel # 1. Load the model and processor from the Hub repo_id = "paulprt/swin-fish-classification" # This downloads the code from the Hub and initializes the classes automatically processor = AutoImageProcessor.from_pretrained(repo_id, trust_remote_code=True) model = AutoModel.from_pretrained(repo_id, trust_remote_code=True) ``` You can find the class mappings using: ```python model.config.id2label # Mapping from class IDs to labels model.config.label2id # Mapping from labels to class IDs ``` ## Training The model was trained for 80 epochs on a A10 GPU, first by training the classifier head only for 30 epochs and then by training the entire model for 50 more epochs. ![loss_curve](https://cdn-uploads.huggingface.co/production/uploads/6661f6eeb8e23e099edebafb/N3vKSOhykFIy2cMr6oD-F.png) ## Results Following the original fish-vista paper, I computed the accuracy accross bins on the test dataset (ultra rare 2-10, minority 10-100, neutral 100-500, majority 500+). ![image](https://cdn-uploads.huggingface.co/production/uploads/6661f6eeb8e23e099edebafb/ItwMxmZz2J6HhKr7cNLt8.png)