---
title: Food Matcher AI (SigLIP Edition)
emoji: 🍔
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: false
---

# 🍔 Visual Dish Matcher AI

**A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.**

## 🎯 Project Overview
This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar.

**Key Features:**
* **Multimodal Search:** Find food using an image *or* a text description.
* **Advanced Data Cleaning:** Automated detection of blurry or low-quality images.
* **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine.

**Live Demo:** [Click "App" tab above to view]

---

## 🛠️ Tech Stack
* **Model:** Google SigLIP (`google/siglip-base-patch16-224`)
* **Frameworks:** PyTorch, Transformers, Gradio, Datasets
* **Data Engineering:** OpenCV (Feature Extraction), NumPy
* **Data Storage:** Parquet (via Git LFS)
* **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)

---

## 📊 Part 1: Data Analysis & Cleaning
**Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).

### 1. Exploratory Data Analysis (EDA)
Before any modeling, we analyzed the raw data to ensure quality and balance.

* **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
* **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images.
* **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**.


![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png)


![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png)


![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png)

### 2. Data Cleaning
Based on the plots above, **we deleted "bad" images** that were:
* Too Dark (Avg Pixel Intensity < 20)
* Too Bright/Washed out (Avg Pixel Intensity > 245)
* Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)


---

## ⚔️ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.

### The Contestants:
1.  **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
2.  **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
3.  **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m)

### The Evaluation:
We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).

* **Metric:** Silhouette Score
* **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches).

**Visual Comparison:**
We queried both models with the same image to see which returned more accurate similar foods.


![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/Yz1YyU-eGcH9806Kg6PiF.png)

---

## 🧠 Part 3: Embeddings & Clustering
Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts.

* **Algorithm:** K-Means Clustering (k=101 categories).
* **Visualization:**
    * **PCA:** To see the global variance.
    * **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").


![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/i3qZepniP0HqGQ8m5H-7K.png)

---

## 🚀 Part 4: The Application
The final product is a **Gradio** web application hosted on Hugging Face Spaces.

1.  **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
2.  **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.

## Note
The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier

### How to Run Locally
1.  **Clone the repository:**
    ```bash
    git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
    cd Food-Match
    ```
2.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```
3.  **Run the app:**
    ```bash
    python app.py
    ```

---

## 📂 Repository Structure
* `app.py`: Main application logic (Gradio + SigLIP).
* `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database.
* `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
* `README.md`: Project documentation.

---

## ✍️ Authors
**Matan Kriel**
**Odeya Shmuel**
*Assignment #3: Embeddings, RecSys, and Spaces*