Vision Transformer (ViT) Fine-tuned on Food101 Subset
Model Description
This model is a fine-tuned version of google/vit-base-patch16-224 for food image classification across 10 categories.
Classes
- pizza
- sushi
- hamburger
- ice_cream
- steak
- baklava
- cheesecake
- pancakes
- tacos
- ramen
Evaluation Results
| Metric | Value |
|---|---|
| Accuracy | 98.04% |
Training Logs
| Epoch | Training Loss | Validation Loss | Accuracy |
|---|---|---|---|
| 1 | 0.3254 | 0.1076 | 97.20% |
| 2 | 0.1216 | 0.0904 | 97.68% |
| 3 | 0.0361 | 0.0770 | 97.88% |
| 4 | 0.0118 | 0.0764 | 98.00% |
| 5 | 0.0084 | 0.0767 | 98.04% |
Training Summary:
- Total steps: 1,175
- Final training loss: 0.2446
- Training runtime: 2,705 seconds (~45 minutes)
- Throughput: 13.86 samples/second
Reproduce Evaluation
from datasets import load_dataset
from transformers import pipeline
from tqdm import tqdm
# Load model
classifier = pipeline("image-classification", model="Nav772/vit-food-classifier", device=0)
# Load same test split
dataset = load_dataset("food101", split="validation")
# Filter to same 10 classes
selected_classes = ["pizza", "sushi", "hamburger", "ice_cream", "steak",
"baklava", "cheesecake", "pancakes", "tacos", "ramen"]
class_names = dataset.features['label'].names
selected_indices = [class_names.index(c) for c in selected_classes]
filtered = dataset.filter(lambda x: x['label'] in selected_indices)
# Evaluate
correct = 0
total = 0
for example in tqdm(filtered):
pred = classifier(example['image'])[0]['label']
true_label = class_names[example['label']]
if pred == true_label:
correct += 1
total += 1
print(f"Accuracy: {correct/total:.4f} ({correct}/{total})")
Training Data
- Dataset: Food101 (subset)
- Train samples: ~7,500 images
- Validation samples: ~2,500 images
- Classes: 10 food categories
Training Procedure
- Base model: google/vit-base-patch16-224
- Epochs: 5
- Batch size: 32
- Learning rate: 3e-5
- Image size: 224x224
- Mixed precision: FP16
- Warmup ratio: 0.1
- Weight decay: 0.01
Usage
from transformers import pipeline
classifier = pipeline("image-classification", model="Nav772/vit-food-classifier")
result = classifier("path/to/food/image.jpg")
print(result)
Limitations
- Only classifies 10 specific food categories
- May not generalize to food items outside these categories
- Performance may degrade on low-quality or obscured images
- Downloads last month
- 35
Dataset used to train Nav772/vit-food-classifier
Space using Nav772/vit-food-classifier 1
Evaluation results
- Accuracy on food101validation set self-reported0.980