Vision Transformer (ViT) Fine-tuned on Food101 Subset

Model Description

This model is a fine-tuned version of google/vit-base-patch16-224 for food image classification across 10 categories.

Classes

  • pizza
  • sushi
  • hamburger
  • ice_cream
  • steak
  • baklava
  • cheesecake
  • pancakes
  • tacos
  • ramen

Evaluation Results

Metric Value
Accuracy 98.04%

Training Logs

Epoch Training Loss Validation Loss Accuracy
1 0.3254 0.1076 97.20%
2 0.1216 0.0904 97.68%
3 0.0361 0.0770 97.88%
4 0.0118 0.0764 98.00%
5 0.0084 0.0767 98.04%

Training Summary:

  • Total steps: 1,175
  • Final training loss: 0.2446
  • Training runtime: 2,705 seconds (~45 minutes)
  • Throughput: 13.86 samples/second

Reproduce Evaluation

from datasets import load_dataset
from transformers import pipeline
from tqdm import tqdm

# Load model
classifier = pipeline("image-classification", model="Nav772/vit-food-classifier", device=0)

# Load same test split
dataset = load_dataset("food101", split="validation")

# Filter to same 10 classes
selected_classes = ["pizza", "sushi", "hamburger", "ice_cream", "steak", 
                    "baklava", "cheesecake", "pancakes", "tacos", "ramen"]
class_names = dataset.features['label'].names
selected_indices = [class_names.index(c) for c in selected_classes]

filtered = dataset.filter(lambda x: x['label'] in selected_indices)

# Evaluate
correct = 0
total = 0

for example in tqdm(filtered):
    pred = classifier(example['image'])[0]['label']
    true_label = class_names[example['label']]
    if pred == true_label:
        correct += 1
    total += 1

print(f"Accuracy: {correct/total:.4f} ({correct}/{total})")

Training Data

  • Dataset: Food101 (subset)
  • Train samples: ~7,500 images
  • Validation samples: ~2,500 images
  • Classes: 10 food categories

Training Procedure

  • Base model: google/vit-base-patch16-224
  • Epochs: 5
  • Batch size: 32
  • Learning rate: 3e-5
  • Image size: 224x224
  • Mixed precision: FP16
  • Warmup ratio: 0.1
  • Weight decay: 0.01

Usage

from transformers import pipeline

classifier = pipeline("image-classification", model="Nav772/vit-food-classifier")
result = classifier("path/to/food/image.jpg")
print(result)

Limitations

  • Only classifies 10 specific food categories
  • May not generalize to food items outside these categories
  • Performance may degrade on low-quality or obscured images
Downloads last month
35
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Nav772/vit-food-classifier

Space using Nav772/vit-food-classifier 1

Evaluation results