---
license: mit
datasets:
- food101
metrics:
- accuracy
pipeline_tag: image-classification
tags:
- vision
- food-classification
- vit
model-index:
- name: vit-food-classifier
  results:
  - task:
      type: image-classification
    dataset:
      name: food101
      type: food101
      split: validation
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.9804
---

# Vision Transformer (ViT) Fine-tuned on Food101 Subset

## Model Description

This model is a fine-tuned version of `google/vit-base-patch16-224` for food image classification across 10 categories.

## Classes

- pizza
- sushi
- hamburger
- ice_cream
- steak
- baklava
- cheesecake
- pancakes
- tacos
- ramen

## Evaluation Results

| Metric | Value |
|--------|-------|
| **Accuracy** | 98.04% |

## Training Logs

| Epoch | Training Loss | Validation Loss | Accuracy |
|-------|---------------|-----------------|----------|
| 1     | 0.3254        | 0.1076          | 97.20%   |
| 2     | 0.1216        | 0.0904          | 97.68%   |
| 3     | 0.0361        | 0.0770          | 97.88%   |
| 4     | 0.0118        | 0.0764          | 98.00%   |
| 5     | 0.0084        | 0.0767          | **98.04%** |

**Training Summary:**
- Total steps: 1,175
- Final training loss: 0.2446
- Training runtime: 2,705 seconds (~45 minutes)
- Throughput: 13.86 samples/second

### Reproduce Evaluation
```python
from datasets import load_dataset
from transformers import pipeline
from tqdm import tqdm

# Load model
classifier = pipeline("image-classification", model="Nav772/vit-food-classifier", device=0)

# Load same test split
dataset = load_dataset("food101", split="validation")

# Filter to same 10 classes
selected_classes = ["pizza", "sushi", "hamburger", "ice_cream", "steak", 
                    "baklava", "cheesecake", "pancakes", "tacos", "ramen"]
class_names = dataset.features['label'].names
selected_indices = [class_names.index(c) for c in selected_classes]

filtered = dataset.filter(lambda x: x['label'] in selected_indices)

# Evaluate
correct = 0
total = 0

for example in tqdm(filtered):
    pred = classifier(example['image'])[0]['label']
    true_label = class_names[example['label']]
    if pred == true_label:
        correct += 1
    total += 1

print(f"Accuracy: {correct/total:.4f} ({correct}/{total})")
```

## Training Data

- **Dataset**: Food101 (subset)
- **Train samples**: ~7,500 images
- **Validation samples**: ~2,500 images
- **Classes**: 10 food categories

## Training Procedure

- **Base model**: google/vit-base-patch16-224
- **Epochs**: 5
- **Batch size**: 32
- **Learning rate**: 3e-5
- **Image size**: 224x224
- **Mixed precision**: FP16
- **Warmup ratio**: 0.1
- **Weight decay**: 0.01

## Usage
```python
from transformers import pipeline

classifier = pipeline("image-classification", model="Nav772/vit-food-classifier")
result = classifier("path/to/food/image.jpg")
print(result)
```

## Limitations

- Only classifies 10 specific food categories
- May not generalize to food items outside these categories
- Performance may degrade on low-quality or obscured images