Matan Kriel commited on
Commit
b6d47fe
·
1 Parent(s): 13f1929

added files

Browse files
Files changed (4) hide show
  1. README.md +112 -6
  2. app.py +113 -0
  3. food_embeddings.parquet +3 -0
  4. requirements.txt +8 -0
README.md CHANGED
@@ -1,12 +1,118 @@
1
  ---
2
- title: Food Model
3
- emoji: 👁
4
- colorFrom: red
5
- colorTo: blue
6
  sdk: gradio
7
- sdk_version: 6.2.0
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: Food Match
3
+ emoji: 🌍
4
+ colorFrom: pink
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 6.1.0
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ short_description: Trained model to detect and recommend similar foods.
12
+ ---
13
+
14
+ # 🍔 Visual Dish Matcher AI
15
+
16
+ **A computer vision app that suggests dishes based on visual/text similarity.**
17
+
18
+ ## 🎯 Project Overview
19
+ This project explores the power of **Vector Embeddings** in building recommendation systems. Unlike traditional filters (e.g., "Show me Italian food"), this app uses **OpenAI's CLIP model** to "see" the food. It converts images into mathematical vectors and finds matches based on visual content—texture, color, shape, and ingredients.
20
+
21
+ **Live Demo:** [Click "App" tab above to view]
22
+
23
+ ---
24
+
25
+ ## 🛠️ Tech Stack
26
+ * **Model:** OpenAI CLIP (`clip-vit-base-patch32`)
27
+ * **Frameworks:** PyTorch, Transformers, Datasets (Hugging Face)
28
+ * **Interface:** Gradio
29
+ * **Data Storage:** Parquet (via Git LFS)
30
+ * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
31
+
32
+ ---
33
+
34
+ ## DataSet - Food101
35
+
36
+ Food-101 dataset, a popular benchmark for fine-grained image classification. Unlike "clean" studio datasets, Food-101 contains real-world images taken in various lighting conditions, angles, and noise levels, making it highly representative of photos users typically upload to social media or food apps.
37
+
38
+ Key Features:
39
+
40
+ 101 Categories: Covers a wide range of international dishes, including Sushi, Pizza, Hamburger, Pad Thai, Baklava, and Chocolate Mousse.
41
+
42
+ "In the Wild" Data: Images are not perfectly centered or lit; they contain background noise (plates, cutlery, restaurant tables), challenging the model to focus on the food itself.
43
+
44
+ Project Subset: To ensure computational efficiency for this assignment, a randomized stratified subset of 5,000 images was selected from the training split.
45
+
46
+ Data Structure:
47
+
48
+ Input: RGB Images (various aspect ratios, resized during processing).
49
+
50
+ Labels: 101 unique Integer IDs mapped to human-readable Class Names.
51
+
52
+ ---
53
+
54
+ ## 📊 Part 1: Data Exploration (EDA)
55
+ **Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101)
56
+ To ensure computational efficiency for the assignment, I utilized a randomized subset of **5,000 images** spanning 101 categories.
57
+
58
+ ### 1. Data Cleaning
59
+ Before training, the dataset underwent rigorous cleaning:
60
+ * **Format Correction:** Converted distinct Grayscale images to RGB to ensure compatibility with the CLIP model.
61
+ * **Outlier Detection:** Analyzed image brightness and aspect ratios to identify and flag low-quality or distorted images (e.g., pitch-black photos or extreme panoramas).
62
+
63
+ ### 2. Image Distribution
64
+ We verified the class balance to ensure the model wasn't biased toward specific categories.
65
+
66
+
67
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/eyW5Q6CEQJyLLzAcMl_pi.png)
68
+
69
+ ### 3. Dimensionality Analysis
70
+ We analyzed the width vs. height of the dataset to verify that most images were standard sizes suitable for resizing.
71
+
72
+
73
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/OsJzdqlph74L5os7RuUPK.png)
74
+
75
+ ### Outlier Detection
76
+
77
+
78
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/JV5iLSjmnLgtVRWiApKYs.png)
79
+
80
+ ---
81
+
82
+ ## 🧠 Part 2: Embeddings & Clustering
83
+ The core of the "Visual Matcher" is the embedding space. We generated 512-dimensional vectors for every image in the training set.
84
+
85
+ ### Clustering Analysis
86
+ Using **K-Means**, we grouped these vectors to see if the model could automatically discover food categories without being told the labels.
87
+ * **Algorithm:** K-Means (k=50)
88
+ * **Dimensionality Reduction:** t-SNE (to visualize 512D vectors in 2D)
89
+
90
+ **Key Insight:** The model successfully grouped foods by visual properties. For example, "Red/Orange" foods (Pizza, Lasagna) formed distinct clusters separate from "Green" foods (Salads, Guacamole).
91
+
92
+
93
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/ZqA16HEHvfXKzNYsyTgdz.png)
94
+
95
+
96
+ ---
97
+
98
+ ## 🚀 Part 3: The Application
99
+ The final product is a **Gradio** web application hosted on Hugging Face Spaces. It supports two modes of interaction:
100
+
101
+ 1. **Image-to-Image:** The user uploads a photo (e.g., a burger). The app embeds the upload and calculates **Cosine Similarity** against the database to find the nearest visual neighbors.
102
+ 2. **Text-to-Image:** The user types a description (e.g., "Spicy Tacos"). The app uses CLIP's text encoder to find images that match the semantic meaning of the text.
103
+
104
+ ---
105
+
106
+ ## 📂 Repository Structure
107
+ * `app.py`: Main application logic and Gradio interface.
108
+ * `food_embeddings.parquet`: Pre-computed vector database (stored via Git LFS).
109
+ * `requirements.txt`: Python dependencies.
110
+ * `README.md`: Project documentation.
111
+
112
+ ---
113
+
114
+ ## ✍️ Author
115
+ **[Matan Kriel]**
116
+ *Assignment #3: Embeddings, RecSys, and Spaces*
117
  ---
118
 
 
app.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ import pandas as pd
4
+ import numpy as np
5
+ from PIL import Image
6
+ from transformers import CLIPProcessor, CLIPModel
7
+ from datasets import load_dataset
8
+ from torch.nn import functional as F
9
+
10
+ # --- 1. SETUP & CONFIG ---
11
+ MODEL_ID = "openai/clip-vit-base-patch32"
12
+ DATA_FILE = "food_embeddings.parquet"
13
+
14
+ print("⏳ Starting App... Loading Model...")
15
+ # Load Model (CPU is fine for inference on single images)
16
+ model = CLIPModel.from_pretrained(MODEL_ID)
17
+ processor = CLIPProcessor.from_pretrained(MODEL_ID)
18
+
19
+ # --- 2. LOAD DATA (Must match Colab logic EXACTLY) ---
20
+ print("⏳ Loading Dataset (this takes a moment)...")
21
+ # We load the same 5000 images using the same seed so indices match the parquet file
22
+ dataset = load_dataset("ethz/food101", split="train").shuffle(seed=42).select(range(5000))
23
+
24
+ # --- 3. LOAD EMBEDDINGS ---
25
+ print("⏳ Loading Pre-computed Embeddings...")
26
+ df = pd.read_parquet(DATA_FILE)
27
+ # Convert the list of numbers in the parquet back to a Torch Tensor
28
+ db_features = torch.tensor(np.stack(df['embedding'].to_numpy()))
29
+ # Normalize once for speed
30
+ db_features = F.normalize(db_features, p=2, dim=1)
31
+
32
+ print("✅ App Ready!")
33
+
34
+ # --- 4. CORE SEARCH LOGIC ---
35
+ def find_best_matches(query_features, top_k=3):
36
+ # Normalize query
37
+ query_features = F.normalize(query_features, p=2, dim=1)
38
+
39
+ # Calculate Similarity (Dot Product)
40
+ # Query (1x512) * DB (5000x512) = Scores (1x5000)
41
+ similarity = torch.mm(query_features, db_features.T)
42
+
43
+ # Get Top K
44
+ scores, indices = torch.topk(similarity, k=top_k)
45
+
46
+ results = []
47
+ for idx, score in zip(indices[0], scores[0]):
48
+ idx = idx.item()
49
+
50
+ # Grab image and info from the loaded dataset
51
+ img = dataset[idx]['image']
52
+ label = df.iloc[idx]['label_name'] # Get label from our dataframe
53
+
54
+ # Format output
55
+ results.append((img, f"{label} ({score:.2f})"))
56
+ return results
57
+
58
+ # --- 5. GRADIO FUNCTIONS ---
59
+ def search_by_image(input_image):
60
+ if input_image is None: return []
61
+
62
+ inputs = processor(images=input_image, return_tensors="pt")
63
+ with torch.no_grad():
64
+ features = model.get_image_features(**inputs)
65
+
66
+ return find_best_matches(features)
67
+
68
+ def search_by_text(input_text):
69
+ if not input_text: return []
70
+
71
+ inputs = processor(text=[input_text], return_tensors="pt", padding=True)
72
+ with torch.no_grad():
73
+ features = model.get_text_features(**inputs)
74
+
75
+ return find_best_matches(features)
76
+
77
+ # --- 6. BUILD UI ---
78
+ with gr.Blocks(title="Food Matcher AI") as demo:
79
+ gr.Markdown("# 🍔 Visual Dish Matcher")
80
+ gr.Markdown("Upload a photo of food (or describe it) to find similar dishes in our database.")
81
+
82
+ # --- VIDEO SECTION ---
83
+ # Using Accordion so it doesn't clutter the UI. Open=False means it starts closed.
84
+ with gr.Accordion("📺 Watch Project Demo", open=False):
85
+ gr.HTML("""
86
+ <div style="display: flex; justify-content: center;">
87
+ <iframe width="560" height="315"
88
+ src="https://www.youtube.com/watch?v=Al665qltkDg&t=4s"
89
+ title="YouTube video player"
90
+ frameborder="0"
91
+ allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
92
+ allowfullscreen>
93
+ </iframe>
94
+ </div>
95
+ """)
96
+ # ----------------------------
97
+
98
+ with gr.Tab("Image Search"):
99
+ with gr.Row():
100
+ img_input = gr.Image(type="pil", label="Upload Food Image")
101
+ img_gallery = gr.Gallery(label="Top Matches")
102
+ btn_img = gr.Button("Find Similar Dishes")
103
+ btn_img.click(search_by_image, inputs=img_input, outputs=img_gallery)
104
+
105
+ with gr.Tab("Text Search"):
106
+ with gr.Row():
107
+ txt_input = gr.Textbox(label="Describe the food (e.g., 'Spicy Tacos')")
108
+ txt_gallery = gr.Gallery(label="Top Matches")
109
+ btn_txt = gr.Button("Search by Description")
110
+ btn_txt.click(search_by_text, inputs=txt_input, outputs=txt_gallery)
111
+
112
+ # Launch (Disable SSR for stability)
113
+ demo.launch(ssr_mode=False)
food_embeddings.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87bbf48a556dd11a473e2e168ef6f94cd9bc150ccabbcea9dda084b2cb9ca3b9
3
+ size 8828792
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ gradio
2
+ torch
3
+ transformers
4
+ pandas
5
+ numpy
6
+ datasets
7
+ pyarrow
8
+ scikit-learn