Matan Kriel commited on
Commit
b2aba87
·
1 Parent(s): 9b48fc7

added files

Browse files
Assignment_3 SigLIP.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
README.md CHANGED
@@ -1,12 +1,145 @@
1
  ---
2
- title: Food Recommender
3
- emoji: 😻
4
- colorFrom: purple
5
- colorTo: indigo
6
  sdk: gradio
7
- sdk_version: 6.2.0
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Food Matcher AI (SigLIP Edition)
3
+ emoji: 🍔
4
+ colorFrom: green
5
+ colorTo: yellow
6
  sdk: gradio
7
+ sdk_version: 5.0.0
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
+ # 🍔 Visual Dish Matcher AI
13
+
14
+ **A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.**
15
+
16
+ ## 🎯 Project Overview
17
+ This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar.
18
+
19
+ **Key Features:**
20
+ * **Multimodal Search:** Find food using an image *or* a text description.
21
+ * **Advanced Data Cleaning:** Automated detection of blurry or low-quality images.
22
+ * **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine.
23
+
24
+ **Live Demo:** [Click "App" tab above to view]
25
+
26
+ ---
27
+
28
+ ## 🛠️ Tech Stack
29
+ * **Model:** Google SigLIP (`google/siglip-base-patch16-224`)
30
+ * **Frameworks:** PyTorch, Transformers, Gradio, Datasets
31
+ * **Data Engineering:** OpenCV (Feature Extraction), NumPy
32
+ * **Data Storage:** Parquet (via Git LFS)
33
+ * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
34
+
35
+ ---
36
+
37
+ ## 📊 Part 1: Data Analysis & Cleaning
38
+ **Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).
39
+
40
+ ### 1. Exploratory Data Analysis (EDA)
41
+ Before any modeling, we analyzed the raw data to ensure quality and balance.
42
+
43
+ * **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
44
+ * **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images.
45
+ * **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**.
46
+
47
+
48
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png)
49
+
50
+
51
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png)
52
+
53
+
54
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png)
55
+
56
+ ### 2. Data Cleaning
57
+ Based on the plots above, **we deleted "bad" images** that were:
58
+ * Too Dark (Avg Pixel Intensity < 20)
59
+ * Too Bright/Washed out (Avg Pixel Intensity > 245)
60
+ * Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
61
+
62
+ ### 3. Advanced Feature Engineering
63
+ After removing the garbage data, we engineered deeper visual features to assess image content:
64
+
65
+ * **Sharpness Score:** Used Laplacian Variance to find blurry photos.
66
+ * **Dominant Color (Hue):** Analyzed color clusters (e.g., Green for Salads vs. Red for Pizza).
67
+ * **Texture Complexity:** Calculated pixel standard deviation to distinguish smooth vs. complex foods.
68
+
69
+
70
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/0QMOkOCATUfePwu_-nm0z.png)
71
+
72
+ ---
73
+
74
+ ## ⚔️ Part 2: Model Comparison (CLIP vs. SigLIP)
75
+ To ensure the best search results, we ran a "Challenger" test between two leading multimodal models.
76
+
77
+ ### The Contestants:
78
+ 1. **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
79
+ 2. **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
80
+
81
+ ### The Evaluation:
82
+ We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
83
+
84
+ * **Metric:** Silhouette Score
85
+ * **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches).
86
+
87
+ **Visual Comparison:**
88
+ We queried both models with the same image to see which returned more accurate similar foods.
89
+
90
+
91
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/R4biFno1FUizlVRLRVCqM.png)
92
+
93
+ ---
94
+
95
+ ## 🧠 Part 3: Embeddings & Clustering
96
+ Using the winning model (**SigLIP**), we generated 768-dimensional vectors for the entire dataset. We applied dimensionality reduction to visualize how the AI groups food concepts.
97
+
98
+ * **Algorithm:** K-Means Clustering (k=101 categories).
99
+ * **Visualization:**
100
+ * **PCA:** To see the global variance.
101
+ * **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").
102
+
103
+
104
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/MT92BvvwToLxk83X0Yd12.png)
105
+
106
+
107
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/KyMPVz6VUsIGq2IMEZRYl.png)
108
+
109
+ ---
110
+
111
+ ## 🚀 Part 4: The Application
112
+ The final product is a **Gradio** web application hosted on Hugging Face Spaces.
113
+
114
+ 1. **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
115
+ 2. **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.
116
+
117
+ ### How to Run Locally
118
+ 1. **Clone the repository:**
119
+ ```bash
120
+ git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food-Match](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
121
+ cd Food-Match
122
+ ```
123
+ 2. **Install dependencies:**
124
+ ```bash
125
+ pip install -r requirements.txt
126
+ ```
127
+ 3. **Run the app:**
128
+ ```bash
129
+ python app.py
130
+ ```
131
+
132
+ ---
133
+
134
+ ## 📂 Repository Structure
135
+ * `app.py`: Main application logic (Gradio + SigLIP).
136
+ * `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database.
137
+ * `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
138
+ * `README.md`: Project documentation.
139
+
140
+ ---
141
+
142
+ ## ✍️ Authors
143
+ **Matan Kriel**
144
+ **Odeya Shmuel**
145
+ *Assignment #3: Embeddings, RecSys, and Spaces*
app.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ import pandas as pd
4
+ import numpy as np
5
+ from PIL import Image
6
+ from transformers import AutoProcessor, AutoModel
7
+ from datasets import load_dataset
8
+ from torch.nn import functional as F
9
+
10
+ # --- 1. SETUP & CONFIG ---
11
+ MODEL_ID = "google/siglip-base-patch16-224"
12
+ DATA_FILE = "food_embeddings_siglip.parquet"
13
+
14
+ print(f"⏳ Starting App... Loading Model: {MODEL_ID}...")
15
+ try:
16
+ model = AutoModel.from_pretrained(MODEL_ID)
17
+ processor = AutoProcessor.from_pretrained(MODEL_ID)
18
+ except Exception as e:
19
+ print(f"❌ Model Error: {e}")
20
+
21
+ # --- 2. LOAD DATA ---
22
+ print("⏳ Loading Dataset...")
23
+ # Load exact 5k subset used in training
24
+ dataset = load_dataset("ethz/food101", split="train").shuffle(seed=42).select(range(5000))
25
+
26
+ # --- 3. LOAD EMBEDDINGS ---
27
+ print(f"⏳ Loading Embeddings from {DATA_FILE}...")
28
+ try:
29
+ df = pd.read_parquet(DATA_FILE)
30
+ db_features = torch.tensor(np.stack(df['embedding'].to_numpy()))
31
+ db_features = F.normalize(db_features, p=2, dim=1)
32
+ print("✅ System Ready!")
33
+ except Exception as e:
34
+ print(f"❌ Error loading parquet file: {e}")
35
+ print("⚠️ Please ensure 'food_embeddings_siglip.parquet' is uploaded to the Files tab.")
36
+ db_features = None
37
+
38
+ # --- 4. CORE SEARCH LOGIC ---
39
+ def find_best_matches(query_features, top_k=3):
40
+ if db_features is None:
41
+ return [None] * top_k # Return empty list if DB failed
42
+
43
+ # Normalize query
44
+ query_features = F.normalize(query_features, p=2, dim=1)
45
+
46
+ # Similarity Search
47
+ similarity = torch.mm(query_features, db_features.T)
48
+ scores, indices = torch.topk(similarity, k=top_k)
49
+
50
+ results = []
51
+ for idx, score in zip(indices[0], scores[0]):
52
+ idx = idx.item()
53
+ img = dataset[idx]['image']
54
+ label = df.iloc[idx]['label_name']
55
+ results.append((img, f"{label} ({score:.2f})"))
56
+ return results
57
+
58
+ # --- 5. GRADIO FUNCTIONS ---
59
+ def search_by_image(input_image):
60
+ if input_image is None: return []
61
+ inputs = processor(images=input_image, return_tensors="pt")
62
+ with torch.no_grad():
63
+ features = model.get_image_features(**inputs)
64
+ return find_best_matches(features)
65
+
66
+ def search_by_text(input_text):
67
+ if not input_text: return []
68
+ inputs = processor(text=[input_text], return_tensors="pt", padding="max_length")
69
+ with torch.no_grad():
70
+ features = model.get_text_features(**inputs)
71
+ return find_best_matches(features)
72
+
73
+ # --- 6. BUILD UI ---
74
+ with gr.Blocks(title="Food Matcher AI") as demo:
75
+ gr.Markdown("# 🍔 Visual Dish Matcher")
76
+ gr.Markdown("Upload a photo of food (or describe it) to find similar dishes in our database.")
77
+
78
+ with gr.Tab("Image Search"):
79
+ with gr.Row():
80
+ img_input = gr.Image(type="pil", label="Upload Food Image")
81
+ img_gallery = gr.Gallery(label="Top Matches")
82
+ btn_img = gr.Button("Find Similar Dishes")
83
+ btn_img.click(search_by_image, inputs=img_input, outputs=img_gallery)
84
+
85
+ with gr.Tab("Text Search"):
86
+ with gr.Row():
87
+ txt_input = gr.Textbox(label="Describe the food (e.g., 'Spicy Tacos')")
88
+ txt_gallery = gr.Gallery(label="Top Matches")
89
+ btn_txt = gr.Button("Search by Description")
90
+ btn_txt.click(search_by_text, inputs=txt_input, outputs=txt_gallery)
91
+
92
+ # Launch
93
+ demo.launch()
food_embeddings_siglip.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6f7a90c748628bffd1d2b4b08afa9e70707c593e5d7c98cfcdf9773d658af4e
3
+ size 12925008
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio
2
+ torch
3
+ transformers
4
+ pandas
5
+ numpy
6
+ datasets
7
+ pyarrow
8
+ scikit-learn
9
+ sentencepiece
10
+ protobuf
11
+ pillow