ViT-Base fine-tuned on Imagenette

Fine-tuned from google/vit-base-patch16-224 on the Imagenette 160 px dataset.

Metric Value
Val accuracy 99.52%
Val loss 0.017721
Best epoch 9
Classes 10

Classes

tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute

Usage

from transformers import AutoModelForImageClassification, ViTImageProcessor
from PIL import Image
import torch

model     = AutoModelForImageClassification.from_pretrained("Misupatel/vit-imagenette")
processor = ViTImageProcessor.from_pretrained("Misupatel/vit-imagenette")
model.eval()

image  = Image.open("your_image.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = model.config.id2label[logits.argmax(-1).item()]
print(predicted_class)

Training details

  • Optimizer: AdamW, lr=2e-5
  • Epochs: up to 10 (early stopping, patience=3)
  • Batch size: 32
  • Augmentation: RandomResizedCrop(224), RandomHorizontalFlip, ColorJitter, RandomRotation(15°)
  • Normalisation: mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]
Downloads last month
144
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Misupatel/vit-imagenette

Finetuned
(2062)
this model

Dataset used to train Misupatel/vit-imagenette

Evaluation results