Spaces:

MatanKriel
/

food-model

Runtime error

App Files Files Community

Matan Kriel commited on Jan 2

Commit

980e0ef

1 Parent(s): b6d47fe

changed all files to the working files from working repo

Browse files

Files changed (3) hide show

Assignment_3_Food_Match.ipynb +0 -0
README.md +99 -56
app.py +1 -1

Assignment_3_Food_Match.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

README.md CHANGED Viewed

@@ -1,35 +1,22 @@
 ---
-title: Food Match
-emoji: 🌍
-colorFrom: pink
-colorTo: green
 sdk: gradio
-sdk_version: 6.1.0
 app_file: app.py
 pinned: false
-license: mit
-short_description: Trained model to detect and recommend similar foods.
 ---
 # 🍔 Visual Dish Matcher AI
-**A computer vision app that suggests dishes based on visual/text similarity.**
 ## 🎯 Project Overview
-This project explores the power of **Vector Embeddings** in building recommendation systems. Unlike traditional filters (e.g., "Show me Italian food"), this app uses **OpenAI's CLIP model** to "see" the food. It converts images into mathematical vectors and finds matches based on visual content—texture, color, shape, and ingredients.
-**Live Demo:** [Click "App" tab above to view]
----
-## 🛠️ Tech Stack
-* **Model:** OpenAI CLIP (`clip-vit-base-patch32`)
-* **Frameworks:** PyTorch, Transformers, Datasets (Hugging Face)
-* **Interface:** Gradio
-* **Data Storage:** Parquet (via Git LFS)
-* **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
----
 ## DataSet - Food101
@@ -51,68 +38,124 @@ Labels: 101 unique Integer IDs mapped to human-readable Class Names.
 ---
-## 📊 Part 1: Data Exploration (EDA)
-**Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101)
-To ensure computational efficiency for the assignment, I utilized a randomized subset of **5,000 images** spanning 101 categories.
-### 1. Data Cleaning
-Before training, the dataset underwent rigorous cleaning:
-* **Format Correction:** Converted distinct Grayscale images to RGB to ensure compatibility with the CLIP model.
-* **Outlier Detection:** Analyzed image brightness and aspect ratios to identify and flag low-quality or distorted images (e.g., pitch-black photos or extreme panoramas).
-### 2. Image Distribution
-We verified the class balance to ensure the model wasn't biased toward specific categories.
-![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/eyW5Q6CEQJyLLzAcMl_pi.png)
-### 3. Dimensionality Analysis
-We analyzed the width vs. height of the dataset to verify that most images were standard sizes suitable for resizing.
-![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/OsJzdqlph74L5os7RuUPK.png)
-###  Outlier Detection
-![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/JV5iLSjmnLgtVRWiApKYs.png)
 ---
-## 🧠 Part 2: Embeddings & Clustering
-The core of the "Visual Matcher" is the embedding space. We generated 512-dimensional vectors for every image in the training set.
-### Clustering Analysis
-Using **K-Means**, we grouped these vectors to see if the model could automatically discover food categories without being told the labels.
-* **Algorithm:** K-Means (k=50)
-* **Dimensionality Reduction:** t-SNE (to visualize 512D vectors in 2D)
-**Key Insight:** The model successfully grouped foods by visual properties. For example, "Red/Orange" foods (Pizza, Lasagna) formed distinct clusters separate from "Green" foods (Salads, Guacamole).
-![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/ZqA16HEHvfXKzNYsyTgdz.png)
 ---
-## 🚀 Part 3: The Application
-The final product is a **Gradio** web application hosted on Hugging Face Spaces. It supports two modes of interaction:
-1.  **Image-to-Image:** The user uploads a photo (e.g., a burger). The app embeds the upload and calculates **Cosine Similarity** against the database to find the nearest visual neighbors.
-2.  **Text-to-Image:** The user types a description (e.g., "Spicy Tacos"). The app uses CLIP's text encoder to find images that match the semantic meaning of the text.
 ---
-## 📂 Repository Structure
-* `app.py`: Main application logic and Gradio interface.
-* `food_embeddings.parquet`: Pre-computed vector database (stored via Git LFS).
-* `requirements.txt`: Python dependencies.
-* `README.md`: Project documentation.
 ---
-## ✍️ Author
-**[Matan Kriel]**
-*Assignment #3: Embeddings, RecSys, and Spaces*
 ---

 ---
+title: Food Matcher AI (SigLIP Edition)
+emoji: 🍔
+colorFrom: green
+colorTo: yellow
 sdk: gradio
+sdk_version: 5.0.0
 app_file: app.py
 pinned: false
 ---
 # 🍔 Visual Dish Matcher AI
+**A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.**
 ## 🎯 Project Overview
+This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar.
+---
 ## DataSet - Food101
 ---
+**Key Features:**
+* **Multimodal Search:** Find food using an image *or* a text description.
+* **Advanced Data Cleaning:** Automated detection of blurry or low-quality images.
+* **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine.
+**Live Demo:** [Click "App" tab above to view]
+---
+## 🛠️ Tech Stack
+* **Model:** Google SigLIP (`google/siglip-base-patch16-224`)
+* **Frameworks:** PyTorch, Transformers, Gradio, Datasets
+* **Data Engineering:** OpenCV (Feature Extraction), NumPy
+* **Data Storage:** Parquet (via Git LFS)
+* **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
+---
+## 📊 Part 1: Data Analysis & Cleaning
+**Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).
+### 1. Exploratory Data Analysis (EDA)
+Before any modeling, we analyzed the raw data to ensure quality and balance.
+* **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
+* **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images.
+* **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**.
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png)
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png)
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png)
+### 2. Data Cleaning
+Based on the plots above, **we deleted "bad" images** that were:
+* Too Dark (Avg Pixel Intensity < 20)
+* Too Bright/Washed out (Avg Pixel Intensity > 245)
+* Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
 ---
+## ⚔️ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
+To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.
+### The Contestants:
+1.  **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
+2.  **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
+3.  **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m)
+### The Evaluation:
+We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
+* **Metric:** Silhouette Score
+* **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches).
+**Visual Comparison:**
+We queried both models with the same image to see which returned more accurate similar foods.
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/Yz1YyU-eGcH9806Kg6PiF.png)
 ---
+## 🧠 Part 3: Embeddings & Clustering
+Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts.
+* **Algorithm:** K-Means Clustering (k=101 categories).
+* **Visualization:**
+    * **PCA:** To see the global variance.
+    * **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/i3qZepniP0HqGQ8m5H-7K.png)
 ---
+## 🚀 Part 4: The Application
+The final product is a **Gradio** web application hosted on Hugging Face Spaces.
+1.  **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
+2.  **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.
+## Note
+The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier
+### How to Run Locally
+1.  **Clone the repository:**
+    ```bash
+    git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
+    cd Food-Match
+    ```
+2.  **Install dependencies:**
+    ```bash
+    pip install -r requirements.txt
+    ```
+3.  **Run the app:**
+    ```bash
+    python app.py
+    ```
 ---
+## 📂 Repository Structure
+* `app.py`: Main application logic (Gradio + SigLIP).
+* `food_embeddings.parquet`: Pre-computed vector database.
+* `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
+* `README.md`: Project documentation.
 ---
+## ✍️ Authors
+**Matan Kriel**
+**Odeya Shmuel**
+*Assignment #3: Embeddings, RecSys, and Spaces*

app.py CHANGED Viewed

@@ -85,7 +85,7 @@ with gr.Blocks(title="Food Matcher AI") as demo:
         gr.HTML("""
             <div style="display: flex; justify-content: center;">
                 <iframe width="560" height="315"
-                    src="https://www.youtube.com/watch?v=Al665qltkDg&t=4s"
                     title="YouTube video player"
                     frameborder="0"
                     allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"

         gr.HTML("""
             <div style="display: flex; justify-content: center;">
                 <iframe width="560" height="315"
+                    src="https://www.youtube.com/embed/IXeIxYHi0Es"
                     title="YouTube video player"
                     frameborder="0"
                     allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"