--- license: mit datasets: - food101 metrics: - accuracy pipeline_tag: image-classification tags: - vision - food-classification - vit model-index: - name: vit-food-classifier results: - task: type: image-classification dataset: name: food101 type: food101 split: validation metrics: - name: Accuracy type: accuracy value: 0.9804 --- # Vision Transformer (ViT) Fine-tuned on Food101 Subset ## Model Description This model is a fine-tuned version of `google/vit-base-patch16-224` for food image classification across 10 categories. ## Classes - pizza - sushi - hamburger - ice_cream - steak - baklava - cheesecake - pancakes - tacos - ramen ## Evaluation Results | Metric | Value | |--------|-------| | **Accuracy** | 98.04% | ## Training Logs | Epoch | Training Loss | Validation Loss | Accuracy | |-------|---------------|-----------------|----------| | 1 | 0.3254 | 0.1076 | 97.20% | | 2 | 0.1216 | 0.0904 | 97.68% | | 3 | 0.0361 | 0.0770 | 97.88% | | 4 | 0.0118 | 0.0764 | 98.00% | | 5 | 0.0084 | 0.0767 | **98.04%** | **Training Summary:** - Total steps: 1,175 - Final training loss: 0.2446 - Training runtime: 2,705 seconds (~45 minutes) - Throughput: 13.86 samples/second ### Reproduce Evaluation ```python from datasets import load_dataset from transformers import pipeline from tqdm import tqdm # Load model classifier = pipeline("image-classification", model="Nav772/vit-food-classifier", device=0) # Load same test split dataset = load_dataset("food101", split="validation") # Filter to same 10 classes selected_classes = ["pizza", "sushi", "hamburger", "ice_cream", "steak", "baklava", "cheesecake", "pancakes", "tacos", "ramen"] class_names = dataset.features['label'].names selected_indices = [class_names.index(c) for c in selected_classes] filtered = dataset.filter(lambda x: x['label'] in selected_indices) # Evaluate correct = 0 total = 0 for example in tqdm(filtered): pred = classifier(example['image'])[0]['label'] true_label = class_names[example['label']] if pred == true_label: correct += 1 total += 1 print(f"Accuracy: {correct/total:.4f} ({correct}/{total})") ``` ## Training Data - **Dataset**: Food101 (subset) - **Train samples**: ~7,500 images - **Validation samples**: ~2,500 images - **Classes**: 10 food categories ## Training Procedure - **Base model**: google/vit-base-patch16-224 - **Epochs**: 5 - **Batch size**: 32 - **Learning rate**: 3e-5 - **Image size**: 224x224 - **Mixed precision**: FP16 - **Warmup ratio**: 0.1 - **Weight decay**: 0.01 ## Usage ```python from transformers import pipeline classifier = pipeline("image-classification", model="Nav772/vit-food-classifier") result = classifier("path/to/food/image.jpg") print(result) ``` ## Limitations - Only classifies 10 specific food categories - May not generalize to food items outside these categories - Performance may degrade on low-quality or obscured images