Nav772
/

vit-food-classifier

@@ -10,6 +10,19 @@ tags:
 - vision
 - food-classification
 - vit
 ---
 # Vision Transformer (ViT) Fine-tuned on Food101 Subset
@@ -31,6 +44,62 @@ This model is a fine-tuned version of `google/vit-base-patch16-224` for food ima
 - tacos
 - ramen
 ## Training Data
 - **Dataset**: Food101 (subset)
@@ -46,18 +115,14 @@ This model is a fine-tuned version of `google/vit-base-patch16-224` for food ima
 - **Learning rate**: 3e-5
 - **Image size**: 224x224
 - **Mixed precision**: FP16
-## Evaluation Results
-- **Accuracy**: 98.04%
 ## Usage
 ```python
 from transformers import pipeline
 classifier = pipeline("image-classification", model="Nav772/vit-food-classifier")
-# From local file
 result = classifier("path/to/food/image.jpg")
 print(result)
 ```

 - vision
 - food-classification
 - vit
+model-index:
+- name: vit-food-classifier
+  results:
+  - task:
+      type: image-classification
+    dataset:
+      name: food101
+      type: food101
+      split: validation
+    metrics:
+    - name: Accuracy
+      type: accuracy
+      value: 0.9804
 ---
 # Vision Transformer (ViT) Fine-tuned on Food101 Subset
 - tacos
 - ramen
+## Evaluation Results
+| Metric | Value |
+|--------|-------|
+| **Accuracy** | 98.04% |
+## Training Logs
+| Epoch | Training Loss | Validation Loss | Accuracy |
+|-------|---------------|-----------------|----------|
+| 1     | 0.3254        | 0.1076          | 97.20%   |
+| 2     | 0.1216        | 0.0904          | 97.68%   |
+| 3     | 0.0361        | 0.0770          | 97.88%   |
+| 4     | 0.0118        | 0.0764          | 98.00%   |
+| 5     | 0.0084        | 0.0767          | **98.04%** |
+**Training Summary:**
+- Total steps: 1,175
+- Final training loss: 0.2446
+- Training runtime: 2,705 seconds (~45 minutes)
+- Throughput: 13.86 samples/second
+### Reproduce Evaluation
+```python
+from datasets import load_dataset
+from transformers import pipeline
+from tqdm import tqdm
+# Load model
+classifier = pipeline("image-classification", model="Nav772/vit-food-classifier", device=0)
+# Load same test split
+dataset = load_dataset("food101", split="validation")
+# Filter to same 10 classes
+selected_classes = ["pizza", "sushi", "hamburger", "ice_cream", "steak",
+                    "baklava", "cheesecake", "pancakes", "tacos", "ramen"]
+class_names = dataset.features['label'].names
+selected_indices = [class_names.index(c) for c in selected_classes]
+filtered = dataset.filter(lambda x: x['label'] in selected_indices)
+# Evaluate
+correct = 0
+total = 0
+for example in tqdm(filtered):
+    pred = classifier(example['image'])[0]['label']
+    true_label = class_names[example['label']]
+    if pred == true_label:
+        correct += 1
+    total += 1
+print(f"Accuracy: {correct/total:.4f} ({correct}/{total})")
+```
 ## Training Data
 - **Dataset**: Food101 (subset)
 - **Learning rate**: 3e-5
 - **Image size**: 224x224
 - **Mixed precision**: FP16
+- **Warmup ratio**: 0.1
+- **Weight decay**: 0.01
 ## Usage
 ```python
 from transformers import pipeline
 classifier = pipeline("image-classification", model="Nav772/vit-food-classifier")
 result = classifier("path/to/food/image.jpg")
 print(result)
 ```