Food_Recommender / README.md
MatanKriel's picture
Update README.md
3e810e8 verified
---
title: Food Matcher AI (SigLIP Edition)
emoji: πŸ”
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: false
---
# πŸ” Visual Dish Matcher AI
**A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.**
## 🎯 Project Overview
This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar.
**Key Features:**
* **Multimodal Search:** Find food using an image *or* a text description.
* **Advanced Data Cleaning:** Automated detection of blurry or low-quality images.
* **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine.
**Live Demo:** [Click "App" tab above to view]
---
## πŸ› οΈ Tech Stack
* **Model:** Google SigLIP (`google/siglip-base-patch16-224`)
* **Frameworks:** PyTorch, Transformers, Gradio, Datasets
* **Data Engineering:** OpenCV (Feature Extraction), NumPy
* **Data Storage:** Parquet (via Git LFS)
* **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
---
## πŸ“Š Part 1: Data Analysis & Cleaning
**Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).
### 1. Exploratory Data Analysis (EDA)
Before any modeling, we analyzed the raw data to ensure quality and balance.
* **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
* **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images.
* **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**.
![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png)
![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png)
![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png)
### 2. Data Cleaning
Based on the plots above, **we deleted "bad" images** that were:
* Too Dark (Avg Pixel Intensity < 20)
* Too Bright/Washed out (Avg Pixel Intensity > 245)
* Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
---
## βš”οΈ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.
### The Contestants:
1. **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
2. **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
3. **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m)
### The Evaluation:
We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
* **Metric:** Silhouette Score
* **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches).
**Visual Comparison:**
We queried both models with the same image to see which returned more accurate similar foods.
![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/Yz1YyU-eGcH9806Kg6PiF.png)
---
## 🧠 Part 3: Embeddings & Clustering
Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts.
* **Algorithm:** K-Means Clustering (k=101 categories).
* **Visualization:**
* **PCA:** To see the global variance.
* **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").
![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/i3qZepniP0HqGQ8m5H-7K.png)
---
## πŸš€ Part 4: The Application
The final product is a **Gradio** web application hosted on Hugging Face Spaces.
1. **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
2. **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.
## Note
The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier
### How to Run Locally
1. **Clone the repository:**
```bash
git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
cd Food-Match
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Run the app:**
```bash
python app.py
```
---
## πŸ“‚ Repository Structure
* `app.py`: Main application logic (Gradio + SigLIP).
* `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database.
* `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
* `README.md`: Project documentation.
---
## ✍️ Authors
**Matan Kriel**
**Odeya Shmuel**
*Assignment #3: Embeddings, RecSys, and Spaces*