File size: 5,126 Bytes
9b48fc7
b2aba87
 
 
 
9b48fc7
b2aba87
9b48fc7
 
 
 
b2aba87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e810e8
 
b2aba87
 
 
 
3e810e8
b2aba87
 
 
 
 
 
 
 
 
 
 
3e810e8
 
b2aba87
 
 
 
3e810e8
b2aba87
 
 
 
 
 
 
 
3e810e8
b2aba87
 
 
 
 
 
 
 
 
3e810e8
 
 
b2aba87
 
 
3e810e8
b2aba87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
title: Food Matcher AI (SigLIP Edition)
emoji: πŸ”
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: false
---

# πŸ” Visual Dish Matcher AI

**A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.**

## 🎯 Project Overview
This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar.

**Key Features:**
* **Multimodal Search:** Find food using an image *or* a text description.
* **Advanced Data Cleaning:** Automated detection of blurry or low-quality images.
* **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine.

**Live Demo:** [Click "App" tab above to view]

---

## πŸ› οΈ Tech Stack
* **Model:** Google SigLIP (`google/siglip-base-patch16-224`)
* **Frameworks:** PyTorch, Transformers, Gradio, Datasets
* **Data Engineering:** OpenCV (Feature Extraction), NumPy
* **Data Storage:** Parquet (via Git LFS)
* **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)

---

## πŸ“Š Part 1: Data Analysis & Cleaning
**Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).

### 1. Exploratory Data Analysis (EDA)
Before any modeling, we analyzed the raw data to ensure quality and balance.

* **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
* **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images.
* **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**.


![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png)


![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png)


![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png)

### 2. Data Cleaning
Based on the plots above, **we deleted "bad" images** that were:
* Too Dark (Avg Pixel Intensity < 20)
* Too Bright/Washed out (Avg Pixel Intensity > 245)
* Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)


---

## βš”οΈ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.

### The Contestants:
1.  **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
2.  **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
3.  **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m)

### The Evaluation:
We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).

* **Metric:** Silhouette Score
* **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches).

**Visual Comparison:**
We queried both models with the same image to see which returned more accurate similar foods.



![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/Yz1YyU-eGcH9806Kg6PiF.png)

---

## 🧠 Part 3: Embeddings & Clustering
Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts.

* **Algorithm:** K-Means Clustering (k=101 categories).
* **Visualization:**
    * **PCA:** To see the global variance.
    * **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").



![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/i3qZepniP0HqGQ8m5H-7K.png)

---

## πŸš€ Part 4: The Application
The final product is a **Gradio** web application hosted on Hugging Face Spaces.

1.  **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
2.  **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.

## Note
The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier

### How to Run Locally
1.  **Clone the repository:**
    ```bash
    git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
    cd Food-Match
    ```
2.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```
3.  **Run the app:**
    ```bash
    python app.py
    ```

---

## πŸ“‚ Repository Structure
* `app.py`: Main application logic (Gradio + SigLIP).
* `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database.
* `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
* `README.md`: Project documentation.

---

## ✍️ Authors
**Matan Kriel**
**Odeya Shmuel**
*Assignment #3: Embeddings, RecSys, and Spaces*