maceythm
/

vit-90-animals

@@ -30,33 +30,43 @@ model-index:
       type: accuracy
       value: 0.9796296296296296
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # vit-90-animals
-This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on the iamsouravbanerjee/animal-image-dataset-90-different-animals dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0840
-- Accuracy: 0.9796
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 0.0003
 - train_batch_size: 16
@@ -67,7 +77,6 @@ The following hyperparameters were used during training:
 - num_epochs: 5
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|
 | 1.2021        | 1.0   | 270  | 0.3500          | 0.9611   |
@@ -76,9 +85,7 @@ The following hyperparameters were used during training:
 | 0.1706        | 4.0   | 1080 | 0.1409          | 0.9685   |
 | 0.1678        | 5.0   | 1350 | 0.1373          | 0.9667   |
 ### Framework versions
 - Transformers 4.50.0
 - Pytorch 2.6.0+cu124
 - Datasets 3.4.1

       type: accuracy
       value: 0.9796296296296296
 ---
+___
 # vit-90-animals
+___
 ## Model description
+This model is a fine-tuned Vision Transformer version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on the [animal image dataset](https://www.kaggle.com/datasets/iamsouravbanerjee/animal-image-dataset-90-different-animals) from kaggle - trained to classify images into 90 different animal species. It achieves high accuracy on unseen data and was trained using supervised learning. The model can be used for general-purpose image classification in the animal domain and serves as a comparison baseline for zero-shot classification models such as CLIP.
+The model achieves the following results on the evaluation set:
+- Loss: 0.0840
+- Accuracy: 0.9796
 ## Intended uses & limitations
+### Intended uses
+- Animal image classification (educational, demo, prototyping)
+- Benchmarking against zero-shot classification models
+- Use in Gradio interfaces or image analysis tools
+### Limitations
+- The model is limited to the 90 animal classes it was trained on
+- It may not generalize well to image domains outside of its training distribution
+- Performance can degrade with poor image quality or occlusions
 ## Training and evaluation data
+The model was trained on a dataset containing 5,400 animal images categorized into 90 distinct classes. The dataset was obtained from Kaggle and according to the creator originally sourced from Google Images. The training/validation/test split was 80/10/10, and the label distribution is relatively balanced across classes.
+Evaluation was conducted on the test split and compared to results from a zero-shot model (*openai/clip-vit-large-patch14*) using the same label set.
 ## Training procedure
+- Base model: *google/vit-base-patch16-224*
+- Fine-tuning method: Supervised training using the Hugging Face Trainer class
+- Data augmentation: Applied during training (e.g., RandomHorizontalFlip, ColorJitter)
+- Training time: ~5 epochs with and without augmentation
+- Optimizer: AdamW (default settings)
+- Evaluation metrics: Accuracy, precision, and recall
+- Best performance (no augmentation): 98.3% test accuracy
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 0.0003
 - train_batch_size: 16
 - num_epochs: 5
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|
 | 1.2021        | 1.0   | 270  | 0.3500          | 0.9611   |
 | 0.1706        | 4.0   | 1080 | 0.1409          | 0.9685   |
 | 0.1678        | 5.0   | 1350 | 0.1373          | 0.9667   |
 ### Framework versions
 - Transformers 4.50.0
 - Pytorch 2.6.0+cu124
 - Datasets 3.4.1