oschamp
/

vit-artworkclassifier

@@ -21,7 +21,7 @@ model-index:
     metrics:
     - name: Accuracy
       type: accuracy
-      value: 0.4887640449438202
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -29,10 +29,10 @@ should probably proofread and complete it, then remove this comment. -->
 # vit-artworkclassifier
-This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset, a subset of the artbench-10 dataset. Train set size 1800, test set size 180, split equally over the 9 classes.
 It achieves the following results on the evaluation set:
-- Loss: 1.3363
-- Accuracy: 0.4888
 ## Model description
@@ -57,17 +57,24 @@ The following hyperparameters were used during training:
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 8
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|
-| 1.4136        | 1.79  | 100  | 1.5093          | 0.5112   |
-| 0.7189        | 3.57  | 200  | 1.3363          | 0.4888   |
-| 0.2717        | 5.36  | 300  | 1.4907          | 0.5281   |
-| 0.1227        | 7.14  | 400  | 1.4826          | 0.5562   |
 ### Framework versions
@@ -76,31 +83,3 @@ The following hyperparameters were used during training:
 - Pytorch 1.13.1+cu117
 - Datasets 2.9.0
 - Tokenizers 0.13.2
-### Code to Run
-def vit_classify(image):
-    from transformers import ViTFeatureExtractor
-    from transformers import ViTForImageClassification
-    import torch
-    vit = ViTForImageClassification.from_pretrained("oschamp/vit-artworkclassifier")
-    vit.eval()
-    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-    vit.to(device)
-    model_name_or_path = 'google/vit-base-patch16-224-in21k'
-    feature_extractor = ViTFeatureExtractor.from_pretrained(model_name_or_path)
-    #LOAD IMAGE
-    encoding = feature_extractor(images=image, return_tensors="pt")
-    encoding.keys()
-    pixel_values = encoding['pixel_values'].to(device)
-    outputs = vit(pixel_values)
-    logits = outputs.logits
-    prediction = logits.argmax(-1)
-    return prediction.item() #vit.config.id2label[prediction.item()]

     metrics:
     - name: Accuracy
       type: accuracy
+      value: 0.5947786606129398
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # vit-artworkclassifier
+This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.1392
+- Accuracy: 0.5948
 ## Model description
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- num_epochs: 4
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|
+| 1.5906        | 0.36  | 100  | 1.4709          | 0.4847   |
+| 1.3395        | 0.72  | 200  | 1.3208          | 0.5074   |
+| 1.1461        | 1.08  | 300  | 1.3363          | 0.5165   |
+| 0.9593        | 1.44  | 400  | 1.1790          | 0.5846   |
+| 0.8761        | 1.8   | 500  | 1.1252          | 0.5902   |
+| 0.5922        | 2.16  | 600  | 1.1392          | 0.5948   |
+| 0.4803        | 2.52  | 700  | 1.1560          | 0.5936   |
+| 0.4454        | 2.88  | 800  | 1.1545          | 0.6118   |
+| 0.2271        | 3.24  | 900  | 1.2284          | 0.6039   |
+| 0.207         | 3.6   | 1000 | 1.2625          | 0.5959   |
+| 0.1958        | 3.96  | 1100 | 1.2621          | 0.6005   |
 ### Framework versions
 - Pytorch 1.13.1+cu117
 - Datasets 2.9.0
 - Tokenizers 0.13.2