Nav772
/

vit-food-classifier

Image Classification

food-classification

Eval Results (legacy)

Model card Files Files and versions

vit-food-classifier / README.md

Nav772's picture

Upload README.md with huggingface_hub

0bade36 verified 9 days ago

|

history blame contribute delete

3.09 kB


	---
	license: mit
	datasets:
	- food101
	metrics:
	- accuracy
	pipeline_tag: image-classification
	tags:
	- vision
	- food-classification
	- vit
	model-index:
	- name: vit-food-classifier
	results:
	- task:
	type: image-classification
	dataset:
	name: food101
	type: food101
	split: validation
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.9804
	---

	# Vision Transformer (ViT) Fine-tuned on Food101 Subset

	## Model Description

	This model is a fine-tuned version of `google/vit-base-patch16-224` for food image classification across 10 categories.

	## Classes

	- pizza
	- sushi
	- hamburger
	- ice_cream
	- steak
	- baklava
	- cheesecake
	- pancakes
	- tacos
	- ramen

	## Evaluation Results

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Accuracy \| 98.04% \|

	## Training Logs

	\| Epoch \| Training Loss \| Validation Loss \| Accuracy \|
	\|-------\|---------------\|-----------------\|----------\|
	\| 1 \| 0.3254 \| 0.1076 \| 97.20% \|
	\| 2 \| 0.1216 \| 0.0904 \| 97.68% \|
	\| 3 \| 0.0361 \| 0.0770 \| 97.88% \|
	\| 4 \| 0.0118 \| 0.0764 \| 98.00% \|
	\| 5 \| 0.0084 \| 0.0767 \| 98.04% \|

	Training Summary:
	- Total steps: 1,175
	- Final training loss: 0.2446
	- Training runtime: 2,705 seconds (~45 minutes)
	- Throughput: 13.86 samples/second

	### Reproduce Evaluation
	```python
	from datasets import load_dataset
	from transformers import pipeline
	from tqdm import tqdm

	# Load model
	classifier = pipeline("image-classification", model="Nav772/vit-food-classifier", device=0)

	# Load same test split
	dataset = load_dataset("food101", split="validation")

	# Filter to same 10 classes
	selected_classes = ["pizza", "sushi", "hamburger", "ice_cream", "steak",
	"baklava", "cheesecake", "pancakes", "tacos", "ramen"]
	class_names = dataset.features['label'].names
	selected_indices = [class_names.index(c) for c in selected_classes]

	filtered = dataset.filter(lambda x: x['label'] in selected_indices)

	# Evaluate
	correct = 0
	total = 0

	for example in tqdm(filtered):
	pred = classifier(example['image'])[0]['label']
	true_label = class_names[example['label']]
	if pred == true_label:
	correct += 1
	total += 1

	print(f"Accuracy: {correct/total:.4f} ({correct}/{total})")
	```

	## Training Data

	- Dataset: Food101 (subset)
	- Train samples: ~7,500 images
	- Validation samples: ~2,500 images
	- Classes: 10 food categories

	## Training Procedure

	- Base model: google/vit-base-patch16-224
	- Epochs: 5
	- Batch size: 32
	- Learning rate: 3e-5
	- Image size: 224x224
	- Mixed precision: FP16
	- Warmup ratio: 0.1
	- Weight decay: 0.01

	## Usage
	```python
	from transformers import pipeline

	classifier = pipeline("image-classification", model="Nav772/vit-food-classifier")
	result = classifier("path/to/food/image.jpg")
	print(result)
	```

	## Limitations

	- Only classifies 10 specific food categories
	- May not generalize to food items outside these categories
	- Performance may degrade on low-quality or obscured images