File size: 3,093 Bytes
df98640 edad6a6 df98640 0bade36 edad6a6 df98640 edad6a6 df98640 edad6a6 df98640 edad6a6 df98640 edad6a6 df98640 edad6a6 0bade36 df98640 edad6a6 df98640 edad6a6 df98640 edad6a6 df98640 0bade36 edad6a6 df98640 edad6a6 df98640 cb28fc3 df98640 edad6a6 df98640 edad6a6 cb28fc3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
license: mit
datasets:
- food101
metrics:
- accuracy
pipeline_tag: image-classification
tags:
- vision
- food-classification
- vit
model-index:
- name: vit-food-classifier
results:
- task:
type: image-classification
dataset:
name: food101
type: food101
split: validation
metrics:
- name: Accuracy
type: accuracy
value: 0.9804
---
# Vision Transformer (ViT) Fine-tuned on Food101 Subset
## Model Description
This model is a fine-tuned version of `google/vit-base-patch16-224` for food image classification across 10 categories.
## Classes
- pizza
- sushi
- hamburger
- ice_cream
- steak
- baklava
- cheesecake
- pancakes
- tacos
- ramen
## Evaluation Results
| Metric | Value |
|--------|-------|
| **Accuracy** | 98.04% |
## Training Logs
| Epoch | Training Loss | Validation Loss | Accuracy |
|-------|---------------|-----------------|----------|
| 1 | 0.3254 | 0.1076 | 97.20% |
| 2 | 0.1216 | 0.0904 | 97.68% |
| 3 | 0.0361 | 0.0770 | 97.88% |
| 4 | 0.0118 | 0.0764 | 98.00% |
| 5 | 0.0084 | 0.0767 | **98.04%** |
**Training Summary:**
- Total steps: 1,175
- Final training loss: 0.2446
- Training runtime: 2,705 seconds (~45 minutes)
- Throughput: 13.86 samples/second
### Reproduce Evaluation
```python
from datasets import load_dataset
from transformers import pipeline
from tqdm import tqdm
# Load model
classifier = pipeline("image-classification", model="Nav772/vit-food-classifier", device=0)
# Load same test split
dataset = load_dataset("food101", split="validation")
# Filter to same 10 classes
selected_classes = ["pizza", "sushi", "hamburger", "ice_cream", "steak",
"baklava", "cheesecake", "pancakes", "tacos", "ramen"]
class_names = dataset.features['label'].names
selected_indices = [class_names.index(c) for c in selected_classes]
filtered = dataset.filter(lambda x: x['label'] in selected_indices)
# Evaluate
correct = 0
total = 0
for example in tqdm(filtered):
pred = classifier(example['image'])[0]['label']
true_label = class_names[example['label']]
if pred == true_label:
correct += 1
total += 1
print(f"Accuracy: {correct/total:.4f} ({correct}/{total})")
```
## Training Data
- **Dataset**: Food101 (subset)
- **Train samples**: ~7,500 images
- **Validation samples**: ~2,500 images
- **Classes**: 10 food categories
## Training Procedure
- **Base model**: google/vit-base-patch16-224
- **Epochs**: 5
- **Batch size**: 32
- **Learning rate**: 3e-5
- **Image size**: 224x224
- **Mixed precision**: FP16
- **Warmup ratio**: 0.1
- **Weight decay**: 0.01
## Usage
```python
from transformers import pipeline
classifier = pipeline("image-classification", model="Nav772/vit-food-classifier")
result = classifier("path/to/food/image.jpg")
print(result)
```
## Limitations
- Only classifies 10 specific food categories
- May not generalize to food items outside these categories
- Performance may degrade on low-quality or obscured images
|