Matan Kriel commited on
Commit
cd514b7
ยท
1 Parent(s): 1d15dd1

changed code and read me

Browse files
Files changed (3) hide show
  1. .DS_Store +0 -0
  2. Assignment_3_Food_Match.ipynb +0 -0
  3. README.md +81 -60
.DS_Store ADDED
Binary file (6.15 kB). View file
 
Assignment_3_Food_Match.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
README.md CHANGED
@@ -1,118 +1,139 @@
1
  ---
2
- title: Food Match
3
- emoji: ๐ŸŒ
4
- colorFrom: pink
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 6.1.0
8
  app_file: app.py
9
  pinned: false
10
- license: mit
11
- short_description: Trained model to detect and recommend similar foods.
12
  ---
13
 
14
  # ๐Ÿ” Visual Dish Matcher AI
15
 
16
- **A computer vision app that suggests dishes based on visual/text similarity.**
17
 
18
  ## ๐ŸŽฏ Project Overview
19
- This project explores the power of **Vector Embeddings** in building recommendation systems. Unlike traditional filters (e.g., "Show me Italian food"), this app uses **OpenAI's CLIP model** to "see" the food. It converts images into mathematical vectors and finds matches based on visual contentโ€”texture, color, shape, and ingredients.
 
 
 
 
 
20
 
21
  **Live Demo:** [Click "App" tab above to view]
22
 
23
  ---
24
 
25
  ## ๐Ÿ› ๏ธ Tech Stack
26
- * **Model:** OpenAI CLIP (`clip-vit-base-patch32`)
27
- * **Frameworks:** PyTorch, Transformers, Datasets (Hugging Face)
28
- * **Interface:** Gradio
29
  * **Data Storage:** Parquet (via Git LFS)
30
  * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
31
 
32
  ---
33
 
34
- ## DataSet - Food101
35
-
36
- Food-101 dataset, a popular benchmark for fine-grained image classification. Unlike "clean" studio datasets, Food-101 contains real-world images taken in various lighting conditions, angles, and noise levels, making it highly representative of photos users typically upload to social media or food apps.
37
 
38
- Key Features:
 
39
 
40
- 101 Categories: Covers a wide range of international dishes, including Sushi, Pizza, Hamburger, Pad Thai, Baklava, and Chocolate Mousse.
 
 
41
 
42
- "In the Wild" Data: Images are not perfectly centered or lit; they contain background noise (plates, cutlery, restaurant tables), challenging the model to focus on the food itself.
43
 
44
- Project Subset: To ensure computational efficiency for this assignment, a randomized stratified subset of 5,000 images was selected from the training split.
45
 
46
- Data Structure:
47
 
48
- Input: RGB Images (various aspect ratios, resized during processing).
49
 
50
- Labels: 101 unique Integer IDs mapped to human-readable Class Names.
51
 
52
- ---
53
 
54
- ## ๐Ÿ“Š Part 1: Data Exploration (EDA)
55
- **Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101)
56
- To ensure computational efficiency for the assignment, I utilized a randomized subset of **5,000 images** spanning 101 categories.
 
 
57
 
58
- ### 1. Data Cleaning
59
- Before training, the dataset underwent rigorous cleaning:
60
- * **Format Correction:** Converted distinct Grayscale images to RGB to ensure compatibility with the CLIP model.
61
- * **Outlier Detection:** Analyzed image brightness and aspect ratios to identify and flag low-quality or distorted images (e.g., pitch-black photos or extreme panoramas).
62
 
63
- ### 2. Image Distribution
64
- We verified the class balance to ensure the model wasn't biased toward specific categories.
65
 
 
 
66
 
67
- ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/eyW5Q6CEQJyLLzAcMl_pi.png)
 
 
 
68
 
69
- ### 3. Dimensionality Analysis
70
- We analyzed the width vs. height of the dataset to verify that most images were standard sizes suitable for resizing.
71
 
 
 
72
 
73
- ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/OsJzdqlph74L5os7RuUPK.png)
 
74
 
75
- ### Outlier Detection
76
 
77
 
78
- ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/JV5iLSjmnLgtVRWiApKYs.png)
79
 
80
  ---
81
 
82
- ## ๐Ÿง  Part 2: Embeddings & Clustering
83
- The core of the "Visual Matcher" is the embedding space. We generated 512-dimensional vectors for every image in the training set.
84
 
85
- ### Clustering Analysis
86
- Using **K-Means**, we grouped these vectors to see if the model could automatically discover food categories without being told the labels.
87
- * **Algorithm:** K-Means (k=50)
88
- * **Dimensionality Reduction:** t-SNE (to visualize 512D vectors in 2D)
89
 
90
- **Key Insight:** The model successfully grouped foods by visual properties. For example, "Red/Orange" foods (Pizza, Lasagna) formed distinct clusters separate from "Green" foods (Salads, Guacamole).
91
 
92
 
93
- ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/ZqA16HEHvfXKzNYsyTgdz.png)
94
-
95
 
96
  ---
97
 
98
- ## ๐Ÿš€ Part 3: The Application
99
- The final product is a **Gradio** web application hosted on Hugging Face Spaces. It supports two modes of interaction:
100
-
101
- 1. **Image-to-Image:** The user uploads a photo (e.g., a burger). The app embeds the upload and calculates **Cosine Similarity** against the database to find the nearest visual neighbors.
102
- 2. **Text-to-Image:** The user types a description (e.g., "Spicy Tacos"). The app uses CLIP's text encoder to find images that match the semantic meaning of the text.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  ---
105
 
106
  ## ๐Ÿ“‚ Repository Structure
107
- * `app.py`: Main application logic and Gradio interface.
108
- * `food_embeddings.parquet`: Pre-computed vector database (stored via Git LFS).
109
- * `requirements.txt`: Python dependencies.
110
  * `README.md`: Project documentation.
111
 
112
  ---
113
 
114
- ## โœ๏ธ Author
115
- **[Matan Kriel]**
116
- *Assignment #3: Embeddings, RecSys, and Spaces*
117
- ---
118
-
 
1
  ---
2
+ title: Food Matcher AI (SigLIP Edition)
3
+ emoji: ๐Ÿ”
4
+ colorFrom: green
5
+ colorTo: yellow
6
  sdk: gradio
7
+ sdk_version: 5.0.0
8
  app_file: app.py
9
  pinned: false
 
 
10
  ---
11
 
12
  # ๐Ÿ” Visual Dish Matcher AI
13
 
14
+ **A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.**
15
 
16
  ## ๐ŸŽฏ Project Overview
17
+ This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar.
18
+
19
+ **Key Features:**
20
+ * **Multimodal Search:** Find food using an image *or* a text description.
21
+ * **Advanced Data Cleaning:** Automated detection of blurry or low-quality images.
22
+ * **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine.
23
 
24
  **Live Demo:** [Click "App" tab above to view]
25
 
26
  ---
27
 
28
  ## ๐Ÿ› ๏ธ Tech Stack
29
+ * **Model:** Google SigLIP (`google/siglip-base-patch16-224`)
30
+ * **Frameworks:** PyTorch, Transformers, Gradio, Datasets
31
+ * **Data Engineering:** OpenCV (Feature Extraction), NumPy
32
  * **Data Storage:** Parquet (via Git LFS)
33
  * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
34
 
35
  ---
36
 
37
+ ## ๐Ÿ“Š Part 1: Data Analysis & Cleaning
38
+ **Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).
 
39
 
40
+ ### 1. Exploratory Data Analysis (EDA)
41
+ Before any modeling, we analyzed the raw data to ensure quality and balance.
42
 
43
+ * **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
44
+ * **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images.
45
+ * **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**.
46
 
 
47
 
48
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png)
49
 
 
50
 
51
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png)
52
 
 
53
 
54
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png)
55
 
56
+ ### 2. Data Cleaning
57
+ Based on the plots above, **we deleted "bad" images** that were:
58
+ * Too Dark (Avg Pixel Intensity < 20)
59
+ * Too Bright/Washed out (Avg Pixel Intensity > 245)
60
+ * Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
61
 
 
 
 
 
62
 
63
+ ---
 
64
 
65
+ ## โš”๏ธ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
66
+ To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.
67
 
68
+ ### The Contestants:
69
+ 1. **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
70
+ 2. **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
71
+ 3. **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m)
72
 
73
+ ### The Evaluation:
74
+ We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
75
 
76
+ * **Metric:** Silhouette Score
77
+ * **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches).
78
 
79
+ **Visual Comparison:**
80
+ We queried both models with the same image to see which returned more accurate similar foods.
81
 
 
82
 
83
 
84
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/Yz1YyU-eGcH9806Kg6PiF.png)
85
 
86
  ---
87
 
88
+ ## ๐Ÿง  Part 3: Embeddings & Clustering
89
+ Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts.
90
 
91
+ * **Algorithm:** K-Means Clustering (k=101 categories).
92
+ * **Visualization:**
93
+ * **PCA:** To see the global variance.
94
+ * **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").
95
 
 
96
 
97
 
98
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/i3qZepniP0HqGQ8m5H-7K.png)
 
99
 
100
  ---
101
 
102
+ ## ๐Ÿš€ Part 4: The Application
103
+ The final product is a **Gradio** web application hosted on Hugging Face Spaces.
104
+
105
+ 1. **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
106
+ 2. **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.
107
+
108
+ ## Note
109
+ The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier
110
+
111
+ ### How to Run Locally
112
+ 1. **Clone the repository:**
113
+ ```bash
114
+ git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
115
+ cd Food-Match
116
+ ```
117
+ 2. **Install dependencies:**
118
+ ```bash
119
+ pip install -r requirements.txt
120
+ ```
121
+ 3. **Run the app:**
122
+ ```bash
123
+ python app.py
124
+ ```
125
 
126
  ---
127
 
128
  ## ๐Ÿ“‚ Repository Structure
129
+ * `app.py`: Main application logic (Gradio + SigLIP).
130
+ * `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database.
131
+ * `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
132
  * `README.md`: Project documentation.
133
 
134
  ---
135
 
136
+ ## โœ๏ธ Authors
137
+ **Matan Kriel**
138
+ **Odeya Shmuel**
139
+ *Assignment #3: Embeddings, RecSys, and Spaces*