File size: 3,093 Bytes

df98640
edad6a6
df98640
 
 
 
 
 
 
 
 
 
0bade36
 
 
 
 
 
 
 
 
 
 
 
 
edad6a6
 
df98640
edad6a6
df98640
edad6a6
df98640
edad6a6
df98640
edad6a6
df98640
 
 
 
 
 
 
 
 
 
edad6a6
0bade36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df98640
edad6a6
df98640
 
 
 
edad6a6
df98640
edad6a6
df98640
 
 
 
 
 
0bade36
 
edad6a6
df98640
 
 
edad6a6
df98640
cb28fc3
df98640
 
edad6a6
df98640
edad6a6
cb28fc3


---
license: mit
datasets:
- food101
metrics:
- accuracy
pipeline_tag: image-classification
tags:
- vision
- food-classification
- vit
model-index:
- name: vit-food-classifier
  results:
  - task:
      type: image-classification
    dataset:
      name: food101
      type: food101
      split: validation
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.9804
---

# Vision Transformer (ViT) Fine-tuned on Food101 Subset

## Model Description

This model is a fine-tuned version of `google/vit-base-patch16-224` for food image classification across 10 categories.

## Classes

- pizza
- sushi
- hamburger
- ice_cream
- steak
- baklava
- cheesecake
- pancakes
- tacos
- ramen

## Evaluation Results

| Metric | Value |
|--------|-------|
| **Accuracy** | 98.04% |

## Training Logs

| Epoch | Training Loss | Validation Loss | Accuracy |
|-------|---------------|-----------------|----------|
| 1     | 0.3254        | 0.1076          | 97.20%   |
| 2     | 0.1216        | 0.0904          | 97.68%   |
| 3     | 0.0361        | 0.0770          | 97.88%   |
| 4     | 0.0118        | 0.0764          | 98.00%   |
| 5     | 0.0084        | 0.0767          | **98.04%** |

**Training Summary:**
- Total steps: 1,175
- Final training loss: 0.2446
- Training runtime: 2,705 seconds (~45 minutes)
- Throughput: 13.86 samples/second

### Reproduce Evaluation
```python
from datasets import load_dataset
from transformers import pipeline
from tqdm import tqdm

# Load model
classifier = pipeline("image-classification", model="Nav772/vit-food-classifier", device=0)

# Load same test split
dataset = load_dataset("food101", split="validation")

# Filter to same 10 classes
selected_classes = ["pizza", "sushi", "hamburger", "ice_cream", "steak", 
                    "baklava", "cheesecake", "pancakes", "tacos", "ramen"]
class_names = dataset.features['label'].names
selected_indices = [class_names.index(c) for c in selected_classes]

filtered = dataset.filter(lambda x: x['label'] in selected_indices)

# Evaluate
correct = 0
total = 0

for example in tqdm(filtered):
    pred = classifier(example['image'])[0]['label']
    true_label = class_names[example['label']]
    if pred == true_label:
        correct += 1
    total += 1

print(f"Accuracy: {correct/total:.4f} ({correct}/{total})")
```

## Training Data

- **Dataset**: Food101 (subset)
- **Train samples**: ~7,500 images
- **Validation samples**: ~2,500 images
- **Classes**: 10 food categories

## Training Procedure

- **Base model**: google/vit-base-patch16-224
- **Epochs**: 5
- **Batch size**: 32
- **Learning rate**: 3e-5
- **Image size**: 224x224
- **Mixed precision**: FP16
- **Warmup ratio**: 0.1
- **Weight decay**: 0.01

## Usage
```python
from transformers import pipeline

classifier = pipeline("image-classification", model="Nav772/vit-food-classifier")
result = classifier("path/to/food/image.jpg")
print(result)
```

## Limitations

- Only classifies 10 specific food categories
- May not generalize to food items outside these categories
- Performance may degrade on low-quality or obscured images