--- title: Food Matcher AI (SigLIP Edition) emoji: 🍔 colorFrom: green colorTo: yellow sdk: gradio sdk_version: 5.0.0 app_file: app.py pinned: false --- # 🍔 Visual Dish Matcher AI **A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.** ## 🎯 Project Overview This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar. **Key Features:** * **Multimodal Search:** Find food using an image *or* a text description. * **Advanced Data Cleaning:** Automated detection of blurry or low-quality images. * **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine. **Live Demo:** [Click "App" tab above to view] --- ## 🛠️ Tech Stack * **Model:** Google SigLIP (`google/siglip-base-patch16-224`) * **Frameworks:** PyTorch, Transformers, Gradio, Datasets * **Data Engineering:** OpenCV (Feature Extraction), NumPy * **Data Storage:** Parquet (via Git LFS) * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA) --- ## 📊 Part 1: Data Analysis & Cleaning **Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images). ### 1. Exploratory Data Analysis (EDA) Before any modeling, we analyzed the raw data to ensure quality and balance. * **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class). * **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images. * **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**. ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/qe5z9j81mj2ahlENA2_5l.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/_lh9-4RGOXCb8yy11Jar4.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/6au3HUidoYBsiKreYTPJj.png) ### 2. Data Cleaning Based on the plots above, **we deleted "bad" images** that were: * Too Dark (Avg Pixel Intensity < 20) * Too Bright/Washed out (Avg Pixel Intensity > 245) * Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0) --- ## ⚔️ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip) To ensure the best search results, we ran a "Challenger" test between three leading multimodal models. ### The Contestants: 1. **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`) 2. **Challenger:** Google SigLIP (`siglip-base-patch16-224`) 3. **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m) ### The Evaluation: We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes). * **Metric:** Silhouette Score * **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches). **Visual Comparison:** We queried both models with the same image to see which returned more accurate similar foods. ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/Yz1YyU-eGcH9806Kg6PiF.png) --- ## 🧠 Part 3: Embeddings & Clustering Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts. * **Algorithm:** K-Means Clustering (k=101 categories). * **Visualization:** * **PCA:** To see the global variance. * **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers"). ![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/i3qZepniP0HqGQ8m5H-7K.png) --- ## 🚀 Part 4: The Application The final product is a **Gradio** web application hosted on Hugging Face Spaces. 1. **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches. 2. **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description. ## Note The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier ### How to Run Locally 1. **Clone the repository:** ```bash git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match) cd Food-Match ``` 2. **Install dependencies:** ```bash pip install -r requirements.txt ``` 3. **Run the app:** ```bash python app.py ``` --- ## 📂 Repository Structure * `app.py`: Main application logic (Gradio + SigLIP). * `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database. * `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`). * `README.md`: Project documentation. --- ## ✍️ Authors **Matan Kriel** **Odeya Shmuel** *Assignment #3: Embeddings, RecSys, and Spaces*