ArchitRastogi commited on
Commit
b06edc6
Β·
verified Β·
1 Parent(s): bc7dcd0

Upload folder using huggingface_hub

Browse files
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ViT-Chef β€” Fine-tuned Vision Transformer for Food Classification
2
+
3
+ A fine-tuned [Vision Transformer (ViT)](https://huggingface.co/google/vit-base-patch16-224-in21k) model trained to classify **pizza**, **steak**, and **sushi** images.
4
+ Achieves **96% accuracy** on the test set, demonstrating the strong performance of transfer learning for visual food recognition.
5
+
6
+ ---
7
+
8
+ ## 🧠 Model Details
9
+
10
+ * **Base model**: `google/vit-base-patch16-224-in21k`
11
+ * **Input size**: 224 Γ— 224
12
+ * **Classes**: `["pizza", "steak", "sushi"]`
13
+ * **Accuracy**: 96% (Test set)
14
+ * **Training**: 5-fold cross-validation with AdamW optimizer and early stopping
15
+ * **Dataset**: Custom curated set (225 train, 75 test)
16
+
17
+ ---
18
+
19
+ ## πŸš€ Usage
20
+
21
+ ```python
22
+ from transformers import ViTImageProcessor, ViTForImageClassification
23
+ from PIL import Image
24
+ import torch
25
+
26
+ model = ViTForImageClassification.from_pretrained("archit/vit-chef")
27
+ processor = ViTImageProcessor.from_pretrained("archit/vit-chef")
28
+
29
+ image = Image.open("example.jpg")
30
+ inputs = processor(images=image, return_tensors="pt")
31
+
32
+ with torch.no_grad():
33
+ logits = model(**inputs).logits
34
+ pred = logits.argmax(-1).item()
35
+
36
+ print(model.config.id2label[pred])
37
+ ```
38
+
39
+ ---
40
+
41
+ ## πŸ“Š Results
42
+
43
+ | Metric | Baseline | Fine-tuned | Improvement |
44
+ | :-------------- | :------- | :--------- | :---------- |
45
+ | Accuracy | 46.67% | **96.00%** | +49.33 pp |
46
+ | Error Reduction | β€” | **92.5%** | β€” |
47
+
48
+ ---
49
+
50
+
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ViT-Chef β€” Fine-tuned Vision Transformer for Food Classification
2
+
3
+ A fine-tuned [Vision Transformer (ViT)](https://huggingface.co/google/vit-base-patch16-224-in21k) model trained to classify **pizza**, **steak**, and **sushi** images.
4
+ Achieves **96% accuracy** on the test set, demonstrating the strong performance of transfer learning for visual food recognition.
5
+
6
+ ---
7
+
8
+ ## 🧠 Model Details
9
+
10
+ * **Base model**: `google/vit-base-patch16-224-in21k`
11
+ * **Input size**: 224 Γ— 224
12
+ * **Classes**: `["pizza", "steak", "sushi"]`
13
+ * **Accuracy**: 96% (Test set)
14
+ * **Training**: 5-fold cross-validation with AdamW optimizer and early stopping
15
+ * **Dataset**: Custom curated set (225 train, 75 test)
16
+
17
+ ---
18
+
19
+ ## πŸš€ Usage
20
+
21
+ ```python
22
+ from transformers import ViTImageProcessor, ViTForImageClassification
23
+ from PIL import Image
24
+ import torch
25
+
26
+ model = ViTForImageClassification.from_pretrained("archit/vit-chef")
27
+ processor = ViTImageProcessor.from_pretrained("archit/vit-chef")
28
+
29
+ image = Image.open("example.jpg")
30
+ inputs = processor(images=image, return_tensors="pt")
31
+
32
+ with torch.no_grad():
33
+ logits = model(**inputs).logits
34
+ pred = logits.argmax(-1).item()
35
+
36
+ print(model.config.id2label[pred])
37
+ ```
38
+
39
+ ---
40
+
41
+ ## πŸ“Š Results
42
+
43
+ | Metric | Baseline | Fine-tuned | Improvement |
44
+ | :-------------- | :------- | :--------- | :---------- |
45
+ | Accuracy | 46.67% | **96.00%** | +49.33 pp |
46
+ | Error Reduction | β€” | **92.5%** | β€” |
47
+
48
+ ---
49
+
50
+