Spaces:

MatanKriel
/

Food_Recommender

Sleeping

App Files Files Community

Food_Recommender / README.md

MatanKriel

Update README.md

3e810e8 verified about 1 month ago

preview code

raw

history blame contribute delete

5.13 kB

	---
	title: Food Matcher AI (SigLIP Edition)
	emoji: 🍔
	colorFrom: green
	colorTo: yellow
	sdk: gradio
	sdk_version: 5.0.0
	app_file: app.py
	pinned: false
	---

	# 🍔 Visual Dish Matcher AI

	A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.

	## 🎯 Project Overview
	This project builds a Visual Search Engine for food. Instead of relying on text labels (which can be inaccurate or missing), we use Vector Embeddings to find dishes that look similar.

	Key Features:
	* Multimodal Search: Find food using an image or a text description.
	* Advanced Data Cleaning: Automated detection of blurry or low-quality images.
	* Model Comparison: A scientific comparison between OpenAI CLIP and Google SigLIP to choose the best engine.

	Live Demo: [Click "App" tab above to view]

	---

	## 🛠️ Tech Stack
	* Model: Google SigLIP (`google/siglip-base-patch16-224`)
	* Frameworks: PyTorch, Transformers, Gradio, Datasets
	* Data Engineering: OpenCV (Feature Extraction), NumPy
	* Data Storage: Parquet (via Git LFS)
	* Visualization: Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)

	---

	## 📊 Part 1: Data Analysis & Cleaning
	Dataset: [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images).

	### 1. Exploratory Data Analysis (EDA)
	Before any modeling, we analyzed the raw data to ensure quality and balance.

	* Class Balance Check: We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
	* Image Dimensions: We visualized the width and height distribution to identify unusually small or large images.
	* Outlier Detection: We plotted the distribution of Aspect Ratios and Brightness Levels.


	![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png)


	![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png)


	![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png)

	### 2. Data Cleaning
	Based on the plots above, we deleted "bad" images that were:
	* Too Dark (Avg Pixel Intensity < 20)
	* Too Bright/Washed out (Avg Pixel Intensity > 245)
	* Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)


	---

	## ⚔️ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
	To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.

	### The Contestants:
	1. Baseline: OpenAI CLIP (`clip-vit-base-patch32`)
	2. Challenger: Google SigLIP (`siglip-base-patch16-224`)
	3. Challenger: Facebook MetaCLIP": ("facebook/metaclip-b32-400m)

	### The Evaluation:
	We compared them using Silhouette Scores (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).

	* Metric: Silhouette Score
	* Winner: Google SigLIP (Produced cleaner, more distinct clusters and better visual matches).

	Visual Comparison:
	We queried both models with the same image to see which returned more accurate similar foods.



	![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/Yz1YyU-eGcH9806Kg6PiF.png)

	---

	## 🧠 Part 3: Embeddings & Clustering
	Using the winning model (SigLIP), We applied dimensionality reduction to visualize how the AI groups food concepts.

	* Algorithm: K-Means Clustering (k=101 categories).
	* Visualization:
	* PCA: To see the global variance.
	* t-SNE: To see local groupings (e.g., "Sushi" clusters separately from "Burgers").



	![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/i3qZepniP0HqGQ8m5H-7K.png)

	---

	## 🚀 Part 4: The Application
	The final product is a Gradio web application hosted on Hugging Face Spaces.

	1. Image-to-Image: Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
	2. Text-to-Image: Type "Spicy Tacos" -> The app finds images matching that description.

	## Note
	The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier

	### How to Run Locally
	1. Clone the repository:
	```bash
	git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
	cd Food-Match
	```
	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```
	3. Run the app:
	```bash
	python app.py
	```

	---

	## 📂 Repository Structure
	* `app.py`: Main application logic (Gradio + SigLIP).
	* `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database.
	* `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`).
	* `README.md`: Project documentation.

	---

	## ✍️ Authors
	Matan Kriel
	Odeya Shmuel
	Assignment #3: Embeddings, RecSys, and Spaces