vit-food-classifier / README.md
Nav772's picture
Upload README.md with huggingface_hub
0bade36 verified
---
license: mit
datasets:
- food101
metrics:
- accuracy
pipeline_tag: image-classification
tags:
- vision
- food-classification
- vit
model-index:
- name: vit-food-classifier
results:
- task:
type: image-classification
dataset:
name: food101
type: food101
split: validation
metrics:
- name: Accuracy
type: accuracy
value: 0.9804
---
# Vision Transformer (ViT) Fine-tuned on Food101 Subset
## Model Description
This model is a fine-tuned version of `google/vit-base-patch16-224` for food image classification across 10 categories.
## Classes
- pizza
- sushi
- hamburger
- ice_cream
- steak
- baklava
- cheesecake
- pancakes
- tacos
- ramen
## Evaluation Results
| Metric | Value |
|--------|-------|
| **Accuracy** | 98.04% |
## Training Logs
| Epoch | Training Loss | Validation Loss | Accuracy |
|-------|---------------|-----------------|----------|
| 1 | 0.3254 | 0.1076 | 97.20% |
| 2 | 0.1216 | 0.0904 | 97.68% |
| 3 | 0.0361 | 0.0770 | 97.88% |
| 4 | 0.0118 | 0.0764 | 98.00% |
| 5 | 0.0084 | 0.0767 | **98.04%** |
**Training Summary:**
- Total steps: 1,175
- Final training loss: 0.2446
- Training runtime: 2,705 seconds (~45 minutes)
- Throughput: 13.86 samples/second
### Reproduce Evaluation
```python
from datasets import load_dataset
from transformers import pipeline
from tqdm import tqdm
# Load model
classifier = pipeline("image-classification", model="Nav772/vit-food-classifier", device=0)
# Load same test split
dataset = load_dataset("food101", split="validation")
# Filter to same 10 classes
selected_classes = ["pizza", "sushi", "hamburger", "ice_cream", "steak",
"baklava", "cheesecake", "pancakes", "tacos", "ramen"]
class_names = dataset.features['label'].names
selected_indices = [class_names.index(c) for c in selected_classes]
filtered = dataset.filter(lambda x: x['label'] in selected_indices)
# Evaluate
correct = 0
total = 0
for example in tqdm(filtered):
pred = classifier(example['image'])[0]['label']
true_label = class_names[example['label']]
if pred == true_label:
correct += 1
total += 1
print(f"Accuracy: {correct/total:.4f} ({correct}/{total})")
```
## Training Data
- **Dataset**: Food101 (subset)
- **Train samples**: ~7,500 images
- **Validation samples**: ~2,500 images
- **Classes**: 10 food categories
## Training Procedure
- **Base model**: google/vit-base-patch16-224
- **Epochs**: 5
- **Batch size**: 32
- **Learning rate**: 3e-5
- **Image size**: 224x224
- **Mixed precision**: FP16
- **Warmup ratio**: 0.1
- **Weight decay**: 0.01
## Usage
```python
from transformers import pipeline
classifier = pipeline("image-classification", model="Nav772/vit-food-classifier")
result = classifier("path/to/food/image.jpg")
print(result)
```
## Limitations
- Only classifies 10 specific food categories
- May not generalize to food items outside these categories
- Performance may degrade on low-quality or obscured images