arnabdhar
/

Swin-V2-base-Food

Image Classification

Swinv2ForImageClassification

food-classification

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

arnabdhar commited on Dec 29, 2023

Commit

e70b3ca

·

1 Parent(s): 8868ce7

Update README.md

Files changed (1) hide show

README.md +37 -6

README.md CHANGED Viewed

@@ -14,6 +14,13 @@ metrics:
 model-index:
 - name: Swin-V2-base-Food
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -31,18 +38,42 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -76,4 +107,4 @@ The following hyperparameters were used during training:
 - Transformers 4.35.2
 - Pytorch 2.1.0+cu121
 - Datasets 2.15.0
-- Tokenizers 0.15.0

 model-index:
 - name: Swin-V2-base-Food
   results: []
+datasets:
+- ItsNotRohit/Food121-224
+- food101
+language:
+- en
+library_name: transformers
+pipeline_tag: image-classification
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 ## Model description
+Swin v2 is a powerful vision model based on Transformers, achieving top-notch accuracy in image classification tasks. It excels thanks to:
+- __Hierarchical architecture__: Efficiently captures features at different scales, like CNNs.
+- __Shifted windows__: Improves information flow and reduces computational cost.
+- __Large model capacity__: Enables accurate and generalizable predictions.
+Swin v2 sets new records on ImageNet, even needing 40x less data and training time than similar models. It's also versatile, tackling various vision tasks and handling large images.
+The model was fine tuned on a 120 categories of food images.
+To use the model use the following code snippet:
+```python
+from transformers import pipeline
+from PIL import Image
+# init image classification pipeline
+classifier = pipeline("image-classification", "arnabdhar/Swin-V2-base-Food")
+# use pipeline for inference
+image = Image.open(image_path)
+results = classifier(image)
+```
+## Intended uses
+The model can be used for the following tasks:
+- __Food Image Classification__: Use this model to classify food images using the Transformers `pipeline` module.
+- __Base Model for Fine Tuning__: If you want to use this model for your own custom dataset you can surely do so by treating this model as a base model and fine tune it for your own dataset.
 ## Training procedure
+The fine tuning was done on Google Colab with a NVIDIA T4 GPU with 15GB of VRAM, the model was trained for 20,000 steps and it took ~5.5 hours for the fine tuning to complete which also included periodic evaluation of the model.
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - Transformers 4.35.2
 - Pytorch 2.1.0+cu121
 - Datasets 2.15.0
+- Tokenizers 0.15.0