Spaces:

MatanKriel
/

Food-Match

Sleeping

App Files Files Community

Matan Kriel commited on Dec 30, 2025

Commit

cd514b7

1 Parent(s): 1d15dd1

changed code and read me

Browse files

Files changed (3) hide show

.DS_Store +0 -0
Assignment_3_Food_Match.ipynb +0 -0
README.md +81 -60

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

Assignment_3_Food_Match.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

README.md CHANGED Viewed

@@ -1,118 +1,139 @@
 ---
-title: Food Match
-emoji: 🌍
-colorFrom: pink
-colorTo: green
 sdk: gradio
-sdk_version: 6.1.0
 app_file: app.py
 pinned: false
-license: mit
-short_description: Trained model to detect and recommend similar foods.
 ---
 # 🍔 Visual Dish Matcher AI
-**A computer vision app that suggests dishes based on visual/text similarity.**
 ## 🎯 Project Overview
-This project explores the power of **Vector Embeddings** in building recommendation systems. Unlike traditional filters (e.g., "Show me Italian food"), this app uses **OpenAI's CLIP model** to "see" the food. It converts images into mathematical vectors and finds matches based on visual content—texture, color, shape, and ingredients.
 **Live Demo:** [Click "App" tab above to view]
 ---
 ## 🛠️ Tech Stack
-* **Model:** OpenAI CLIP (`clip-vit-base-patch32`)
-* **Frameworks:** PyTorch, Transformers, Datasets (Hugging Face)
-* **Interface:** Gradio
 * **Data Storage:** Parquet (via Git LFS)
 * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
 ---
-## DataSet - Food101
-Food-101 dataset, a popular benchmark for fine-grained image classification. Unlike "clean" studio datasets, Food-101 contains real-world images taken in various lighting conditions, angles, and noise levels, making it highly representative of photos users typically upload to social media or food apps.
-Key Features:
-101 Categories: Covers a wide range of international dishes, including Sushi, Pizza, Hamburger, Pad Thai, Baklava, and Chocolate Mousse.
-"In the Wild" Data: Images are not perfectly centered or lit; they contain background noise (plates, cutlery, restaurant tables), challenging the model to focus on the food itself.
-Project Subset: To ensure computational efficiency for this assignment, a randomized stratified subset of 5,000 images was selected from the training split.
-Data Structure:
-Input: RGB Images (various aspect ratios, resized during processing).
-Labels: 101 unique Integer IDs mapped to human-readable Class Names.
----
-## 📊 Part 1: Data Exploration (EDA)
-**Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101)
-To ensure computational efficiency for the assignment, I utilized a randomized subset of **5,000 images** spanning 101 categories.
-### 1. Data Cleaning
-Before training, the dataset underwent rigorous cleaning:
-* **Format Correction:** Converted distinct Grayscale images to RGB to ensure compatibility with the CLIP model.
-* **Outlier Detection:** Analyzed image brightness and aspect ratios to identify and flag low-quality or distorted images (e.g., pitch-black photos or extreme panoramas).
-### 2. Image Distribution
-We verified the class balance to ensure the model wasn't biased toward specific categories.
-![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/eyW5Q6CEQJyLLzAcMl_pi.png)
-### 3. Dimensionality Analysis
-We analyzed the width vs. height of the dataset to verify that most images were standard sizes suitable for resizing.
-![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/OsJzdqlph74L5os7RuUPK.png)
-###  Outlier Detection
-![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/JV5iLSjmnLgtVRWiApKYs.png)
 ---
-## 🧠 Part 2: Embeddings & Clustering
-The core of the "Visual Matcher" is the embedding space. We generated 512-dimensional vectors for every image in the training set.
-### Clustering Analysis
-Using **K-Means**, we grouped these vectors to see if the model could automatically discover food categories without being told the labels.
-* **Algorithm:** K-Means (k=50)
-* **Dimensionality Reduction:** t-SNE (to visualize 512D vectors in 2D)
-**Key Insight:** The model successfully grouped foods by visual properties. For example, "Red/Orange" foods (Pizza, Lasagna) formed distinct clusters separate from "Green" foods (Salads, Guacamole).
-![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/ZqA16HEHvfXKzNYsyTgdz.png)
 ---
-## 🚀 Part 3: The Application
-The final product is a **Gradio** web application hosted on Hugging Face Spaces. It supports two modes of interaction:
-1.  **Image-to-Image:** The user uploads a photo (e.g., a burger). The app embeds the upload and calculates **Cosine Similarity** against the database to find the nearest visual neighbors.
-2.  **Text-to-Image:** The user types a description (e.g., "Spicy Tacos"). The app uses CLIP's text encoder to find images that match the semantic meaning of the text.
 ---
 ## 📂 Repository Structure
-* `app.py`: Main application logic and Gradio interface.
-* `food_embeddings.parquet`: Pre-computed vector database (stored via Git LFS).
-* `requirements.txt`: Python dependencies.
 * `README.md`: Project documentation.
 ---
-## ✍️ Author
-**[Matan Kriel]**
-*Assignment #3: Embeddings, RecSys, and Spaces*
----

 ---
+title: Food Matcher AI (SigLIP Edition)
+emoji: 🍔
+colorFrom: green
+colorTo: yellow
 sdk: gradio
+sdk_version: 5.0.0
 app_file: app.py
 pinned: false
 ---
 # 🍔 Visual Dish Matcher AI
+**A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.**
 ## 🎯 Project Overview
+This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar.
+**Key Features:**
+* **Multimodal Search:** Find food using an image *or* a text description.
+* **Advanced Data Cleaning:** Automated detection of blurry or low-quality images.
+* **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine.
 **Live Demo:** [Click "App" tab above to view]
 ---
 ## 🛠️ Tech Stack
+* **Model:** Google SigLIP (`google/siglip-base-patch16-224`)
+* **Frameworks:** PyTorch, Transformers, Gradio, Datasets
+* **Data Engineering:** OpenCV (Feature Extraction), NumPy
 * **Data Storage:** Parquet (via Git LFS)
 * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
 ---
+## 📊 Part 1: Data Analysis & Cleaning
+**Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).
+### 1. Exploratory Data Analysis (EDA)
+Before any modeling, we analyzed the raw data to ensure quality and balance.
+* **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
+* **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images.
+* **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**.
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png)
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png)
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png)
+### 2. Data Cleaning
+Based on the plots above, **we deleted "bad" images** that were:
+* Too Dark (Avg Pixel Intensity < 20)
+* Too Bright/Washed out (Avg Pixel Intensity > 245)
+* Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
+---
+## ⚔️ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
+To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.
+### The Contestants:
+1.  **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
+2.  **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
+3.  **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m)
+### The Evaluation:
+We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
+* **Metric:** Silhouette Score
+* **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches).
+**Visual Comparison:**
+We queried both models with the same image to see which returned more accurate similar foods.
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/Yz1YyU-eGcH9806Kg6PiF.png)
 ---
+## 🧠 Part 3: Embeddings & Clustering
+Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts.
+* **Algorithm:** K-Means Clustering (k=101 categories).
+* **Visualization:**
+    * **PCA:** To see the global variance.
+    * **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/i3qZepniP0HqGQ8m5H-7K.png)
 ---
+## 🚀 Part 4: The Application
+The final product is a **Gradio** web application hosted on Hugging Face Spaces.
+1.  **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
+2.  **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.
+## Note
+The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier
+### How to Run Locally
+1.  **Clone the repository:**
+    ```bash
+    git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
+    cd Food-Match
+    ```
+2.  **Install dependencies:**
+    ```bash
+    pip install -r requirements.txt
+    ```
+3.  **Run the app:**
+    ```bash
+    python app.py
+    ```
 ---
 ## 📂 Repository Structure
+* `app.py`: Main application logic (Gradio + SigLIP).
+* `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database.
+* `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
 * `README.md`: Project documentation.
 ---
+## ✍️ Authors
+**Matan Kriel**
+**Odeya Shmuel**
+*Assignment #3: Embeddings, RecSys, and Spaces*