Spaces:
Runtime error
A newer version of the Gradio SDK is available: 6.13.0
title: Food Matcher AI (SigLIP Edition)
emoji: π
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: false
π Visual Dish Matcher AI
A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.
π― Project Overview
This project builds a Visual Search Engine for food. Instead of relying on text labels (which can be inaccurate or missing), we use Vector Embeddings to find dishes that look similar.
DataSet - Food101
Food-101 dataset, a popular benchmark for fine-grained image classification. Unlike "clean" studio datasets, Food-101 contains real-world images taken in various lighting conditions, angles, and noise levels, making it highly representative of photos users typically upload to social media or food apps.
Key Features:
101 Categories: Covers a wide range of international dishes, including Sushi, Pizza, Hamburger, Pad Thai, Baklava, and Chocolate Mousse.
"In the Wild" Data: Images are not perfectly centered or lit; they contain background noise (plates, cutlery, restaurant tables), challenging the model to focus on the food itself.
Project Subset: To ensure computational efficiency for this assignment, a randomized stratified subset of 5,000 images was selected from the training split.
Data Structure:
Input: RGB Images (various aspect ratios, resized during processing).
Labels: 101 unique Integer IDs mapped to human-readable Class Names.
Key Features:
- Multimodal Search: Find food using an image or a text description.
- Advanced Data Cleaning: Automated detection of blurry or low-quality images.
- Model Comparison: A scientific comparison between OpenAI CLIP and Google SigLIP to choose the best engine.
Live Demo: [Click "App" tab above to view]
π οΈ Tech Stack
- Model: Google SigLIP (
google/siglip-base-patch16-224) - Frameworks: PyTorch, Transformers, Gradio, Datasets
- Data Engineering: OpenCV (Feature Extraction), NumPy
- Data Storage: Parquet (via Git LFS)
- Visualization: Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
π Part 1: Data Analysis & Cleaning
Dataset: Food-101 (ETH Zurich) (Subset of 5,000 images).
1. Exploratory Data Analysis (EDA)
Before any modeling, we analyzed the raw data to ensure quality and balance.
- Class Balance Check: We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class).
- Image Dimensions: We visualized the width and height distribution to identify unusually small or large images.
- Outlier Detection: We plotted the distribution of Aspect Ratios and Brightness Levels.
2. Data Cleaning
Based on the plots above, we deleted "bad" images that were:
- Too Dark (Avg Pixel Intensity < 20)
- Too Bright/Washed out (Avg Pixel Intensity > 245)
- Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
βοΈ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.
The Contestants:
- Baseline: OpenAI CLIP (
clip-vit-base-patch32) - Challenger: Google SigLIP (
siglip-base-patch16-224) - Challenger: Facebook MetaCLIP": ("facebook/metaclip-b32-400m)
The Evaluation:
We compared them using Silhouette Scores (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
- Metric: Silhouette Score
- Winner: Google SigLIP (Produced cleaner, more distinct clusters and better visual matches).
Visual Comparison: We queried both models with the same image to see which returned more accurate similar foods.
π§ Part 3: Embeddings & Clustering
Using the winning model (SigLIP), We applied dimensionality reduction to visualize how the AI groups food concepts.
- Algorithm: K-Means Clustering (k=101 categories).
- Visualization:
- PCA: To see the global variance.
- t-SNE: To see local groupings (e.g., "Sushi" clusters separately from "Burgers").
π Part 4: The Application
The final product is a Gradio web application hosted on Hugging Face Spaces.
- Image-to-Image: Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
- Text-to-Image: Type "Spicy Tacos" -> The app finds images matching that description.
Note
The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier
How to Run Locally
- Clone the repository:
git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match) cd Food-Match - Install dependencies:
pip install -r requirements.txt - Run the app:
python app.py
π Repository Structure
app.py: Main application logic (Gradio + SigLIP).food_embeddings.parquet: Pre-computed vector database.requirements.txt: Python dependencies (includessentencepiece,protobuf).README.md: Project documentation.
βοΈ Authors
Matan Kriel Odeya Shmuel Assignment #3: Embeddings, RecSys, and Spaces




