Spaces:
Sleeping
Sleeping
File size: 5,126 Bytes
9b48fc7 b2aba87 9b48fc7 b2aba87 9b48fc7 b2aba87 3e810e8 b2aba87 3e810e8 b2aba87 3e810e8 b2aba87 3e810e8 b2aba87 3e810e8 b2aba87 3e810e8 b2aba87 3e810e8 b2aba87 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | ---
title: Food Matcher AI (SigLIP Edition)
emoji: π
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: false
---
# π Visual Dish Matcher AI
**A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.**
## π― Project Overview
This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar.
**Key Features:**
* **Multimodal Search:** Find food using an image *or* a text description.
* **Advanced Data Cleaning:** Automated detection of blurry or low-quality images.
* **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine.
**Live Demo:** [Click "App" tab above to view]
---
## π οΈ Tech Stack
* **Model:** Google SigLIP (`google/siglip-base-patch16-224`)
* **Frameworks:** PyTorch, Transformers, Gradio, Datasets
* **Data Engineering:** OpenCV (Feature Extraction), NumPy
* **Data Storage:** Parquet (via Git LFS)
* **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
---
## π Part 1: Data Analysis & Cleaning
**Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).
### 1. Exploratory Data Analysis (EDA)
Before any modeling, we analyzed the raw data to ensure quality and balance.
* **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
* **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images.
* **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**.



### 2. Data Cleaning
Based on the plots above, **we deleted "bad" images** that were:
* Too Dark (Avg Pixel Intensity < 20)
* Too Bright/Washed out (Avg Pixel Intensity > 245)
* Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
---
## βοΈ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.
### The Contestants:
1. **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
2. **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
3. **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m)
### The Evaluation:
We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
* **Metric:** Silhouette Score
* **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches).
**Visual Comparison:**
We queried both models with the same image to see which returned more accurate similar foods.

---
## π§ Part 3: Embeddings & Clustering
Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts.
* **Algorithm:** K-Means Clustering (k=101 categories).
* **Visualization:**
* **PCA:** To see the global variance.
* **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").

---
## π Part 4: The Application
The final product is a **Gradio** web application hosted on Hugging Face Spaces.
1. **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
2. **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.
## Note
The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier
### How to Run Locally
1. **Clone the repository:**
```bash
git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
cd Food-Match
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Run the app:**
```bash
python app.py
```
---
## π Repository Structure
* `app.py`: Main application logic (Gradio + SigLIP).
* `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database.
* `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
* `README.md`: Project documentation.
---
## βοΈ Authors
**Matan Kriel**
**Odeya Shmuel**
*Assignment #3: Embeddings, RecSys, and Spaces* |