Matan Kriel commited on
Commit
980e0ef
ยท
1 Parent(s): b6d47fe

changed all files to the working files from working repo

Browse files
Files changed (3) hide show
  1. Assignment_3_Food_Match.ipynb +0 -0
  2. README.md +99 -56
  3. app.py +1 -1
Assignment_3_Food_Match.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
README.md CHANGED
@@ -1,35 +1,22 @@
1
  ---
2
- title: Food Match
3
- emoji: ๐ŸŒ
4
- colorFrom: pink
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 6.1.0
8
  app_file: app.py
9
  pinned: false
10
- license: mit
11
- short_description: Trained model to detect and recommend similar foods.
12
  ---
13
 
14
  # ๐Ÿ” Visual Dish Matcher AI
15
 
16
- **A computer vision app that suggests dishes based on visual/text similarity.**
17
 
18
  ## ๐ŸŽฏ Project Overview
19
- This project explores the power of **Vector Embeddings** in building recommendation systems. Unlike traditional filters (e.g., "Show me Italian food"), this app uses **OpenAI's CLIP model** to "see" the food. It converts images into mathematical vectors and finds matches based on visual contentโ€”texture, color, shape, and ingredients.
20
 
21
- **Live Demo:** [Click "App" tab above to view]
22
-
23
- ---
24
-
25
- ## ๐Ÿ› ๏ธ Tech Stack
26
- * **Model:** OpenAI CLIP (`clip-vit-base-patch32`)
27
- * **Frameworks:** PyTorch, Transformers, Datasets (Hugging Face)
28
- * **Interface:** Gradio
29
- * **Data Storage:** Parquet (via Git LFS)
30
- * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
31
-
32
- ---
33
 
34
  ## DataSet - Food101
35
 
@@ -51,68 +38,124 @@ Labels: 101 unique Integer IDs mapped to human-readable Class Names.
51
 
52
  ---
53
 
54
- ## ๐Ÿ“Š Part 1: Data Exploration (EDA)
55
- **Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101)
56
- To ensure computational efficiency for the assignment, I utilized a randomized subset of **5,000 images** spanning 101 categories.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- ### 1. Data Cleaning
59
- Before training, the dataset underwent rigorous cleaning:
60
- * **Format Correction:** Converted distinct Grayscale images to RGB to ensure compatibility with the CLIP model.
61
- * **Outlier Detection:** Analyzed image brightness and aspect ratios to identify and flag low-quality or distorted images (e.g., pitch-black photos or extreme panoramas).
62
 
63
- ### 2. Image Distribution
64
- We verified the class balance to ensure the model wasn't biased toward specific categories.
 
65
 
66
 
67
- ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/eyW5Q6CEQJyLLzAcMl_pi.png)
68
 
69
- ### 3. Dimensionality Analysis
70
- We analyzed the width vs. height of the dataset to verify that most images were standard sizes suitable for resizing.
71
 
 
72
 
73
- ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/OsJzdqlph74L5os7RuUPK.png)
74
 
75
- ### Outlier Detection
76
 
 
 
 
 
 
77
 
78
- ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/JV5iLSjmnLgtVRWiApKYs.png)
79
 
80
  ---
81
 
82
- ## ๐Ÿง  Part 2: Embeddings & Clustering
83
- The core of the "Visual Matcher" is the embedding space. We generated 512-dimensional vectors for every image in the training set.
 
 
 
 
 
84
 
85
- ### Clustering Analysis
86
- Using **K-Means**, we grouped these vectors to see if the model could automatically discover food categories without being told the labels.
87
- * **Algorithm:** K-Means (k=50)
88
- * **Dimensionality Reduction:** t-SNE (to visualize 512D vectors in 2D)
89
 
90
- **Key Insight:** The model successfully grouped foods by visual properties. For example, "Red/Orange" foods (Pizza, Lasagna) formed distinct clusters separate from "Green" foods (Salads, Guacamole).
 
91
 
 
 
92
 
93
- ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/ZqA16HEHvfXKzNYsyTgdz.png)
94
 
95
 
 
 
96
  ---
97
 
98
- ## ๐Ÿš€ Part 3: The Application
99
- The final product is a **Gradio** web application hosted on Hugging Face Spaces. It supports two modes of interaction:
 
 
 
 
 
100
 
101
- 1. **Image-to-Image:** The user uploads a photo (e.g., a burger). The app embeds the upload and calculates **Cosine Similarity** against the database to find the nearest visual neighbors.
102
- 2. **Text-to-Image:** The user types a description (e.g., "Spicy Tacos"). The app uses CLIP's text encoder to find images that match the semantic meaning of the text.
 
103
 
104
  ---
105
 
106
- ## ๐Ÿ“‚ Repository Structure
107
- * `app.py`: Main application logic and Gradio interface.
108
- * `food_embeddings.parquet`: Pre-computed vector database (stored via Git LFS).
109
- * `requirements.txt`: Python dependencies.
110
- * `README.md`: Project documentation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
  ---
113
 
114
- ## โœ๏ธ Author
115
- **[Matan Kriel]**
116
- *Assignment #3: Embeddings, RecSys, and Spaces*
 
 
 
117
  ---
118
 
 
 
 
 
 
1
  ---
2
+ title: Food Matcher AI (SigLIP Edition)
3
+ emoji: ๐Ÿ”
4
+ colorFrom: green
5
+ colorTo: yellow
6
  sdk: gradio
7
+ sdk_version: 5.0.0
8
  app_file: app.py
9
  pinned: false
 
 
10
  ---
11
 
12
  # ๐Ÿ” Visual Dish Matcher AI
13
 
14
+ **A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.**
15
 
16
  ## ๐ŸŽฏ Project Overview
17
+ This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar.
18
 
19
+ ---
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## DataSet - Food101
22
 
 
38
 
39
  ---
40
 
41
+ **Key Features:**
42
+ * **Multimodal Search:** Find food using an image *or* a text description.
43
+ * **Advanced Data Cleaning:** Automated detection of blurry or low-quality images.
44
+ * **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine.
45
+
46
+ **Live Demo:** [Click "App" tab above to view]
47
+
48
+ ---
49
+
50
+ ## ๐Ÿ› ๏ธ Tech Stack
51
+ * **Model:** Google SigLIP (`google/siglip-base-patch16-224`)
52
+ * **Frameworks:** PyTorch, Transformers, Gradio, Datasets
53
+ * **Data Engineering:** OpenCV (Feature Extraction), NumPy
54
+ * **Data Storage:** Parquet (via Git LFS)
55
+ * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
56
+
57
+ ---
58
+
59
+ ## ๐Ÿ“Š Part 1: Data Analysis & Cleaning
60
+ **Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).
61
 
62
+ ### 1. Exploratory Data Analysis (EDA)
63
+ Before any modeling, we analyzed the raw data to ensure quality and balance.
 
 
64
 
65
+ * **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
66
+ * **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images.
67
+ * **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**.
68
 
69
 
70
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png)
71
 
 
 
72
 
73
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png)
74
 
 
75
 
76
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png)
77
 
78
+ ### 2. Data Cleaning
79
+ Based on the plots above, **we deleted "bad" images** that were:
80
+ * Too Dark (Avg Pixel Intensity < 20)
81
+ * Too Bright/Washed out (Avg Pixel Intensity > 245)
82
+ * Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
83
 
 
84
 
85
  ---
86
 
87
+ ## โš”๏ธ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
88
+ To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.
89
+
90
+ ### The Contestants:
91
+ 1. **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
92
+ 2. **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
93
+ 3. **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m)
94
 
95
+ ### The Evaluation:
96
+ We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
 
 
97
 
98
+ * **Metric:** Silhouette Score
99
+ * **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches).
100
 
101
+ **Visual Comparison:**
102
+ We queried both models with the same image to see which returned more accurate similar foods.
103
 
 
104
 
105
 
106
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/Yz1YyU-eGcH9806Kg6PiF.png)
107
+
108
  ---
109
 
110
+ ## ๐Ÿง  Part 3: Embeddings & Clustering
111
+ Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts.
112
+
113
+ * **Algorithm:** K-Means Clustering (k=101 categories).
114
+ * **Visualization:**
115
+ * **PCA:** To see the global variance.
116
+ * **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").
117
 
118
+
119
+
120
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/i3qZepniP0HqGQ8m5H-7K.png)
121
 
122
  ---
123
 
124
+ ## ๐Ÿš€ Part 4: The Application
125
+ The final product is a **Gradio** web application hosted on Hugging Face Spaces.
126
+
127
+ 1. **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
128
+ 2. **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.
129
+
130
+ ## Note
131
+ The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier
132
+
133
+ ### How to Run Locally
134
+ 1. **Clone the repository:**
135
+ ```bash
136
+ git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
137
+ cd Food-Match
138
+ ```
139
+ 2. **Install dependencies:**
140
+ ```bash
141
+ pip install -r requirements.txt
142
+ ```
143
+ 3. **Run the app:**
144
+ ```bash
145
+ python app.py
146
+ ```
147
 
148
  ---
149
 
150
+ ## ๐Ÿ“‚ Repository Structure
151
+ * `app.py`: Main application logic (Gradio + SigLIP).
152
+ * `food_embeddings.parquet`: Pre-computed vector database.
153
+ * `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
154
+ * `README.md`: Project documentation.
155
+
156
  ---
157
 
158
+ ## โœ๏ธ Authors
159
+ **Matan Kriel**
160
+ **Odeya Shmuel**
161
+ *Assignment #3: Embeddings, RecSys, and Spaces*
app.py CHANGED
@@ -85,7 +85,7 @@ with gr.Blocks(title="Food Matcher AI") as demo:
85
  gr.HTML("""
86
  <div style="display: flex; justify-content: center;">
87
  <iframe width="560" height="315"
88
- src="https://www.youtube.com/watch?v=Al665qltkDg&t=4s"
89
  title="YouTube video player"
90
  frameborder="0"
91
  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
 
85
  gr.HTML("""
86
  <div style="display: flex; justify-content: center;">
87
  <iframe width="560" height="315"
88
+ src="https://www.youtube.com/embed/IXeIxYHi0Es"
89
  title="YouTube video player"
90
  frameborder="0"
91
  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"