Spaces:
Sleeping
Sleeping
| title: Food Matcher AI (SigLIP Edition) | |
| emoji: π | |
| colorFrom: green | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 5.0.0 | |
| app_file: app.py | |
| pinned: false | |
| # π Visual Dish Matcher AI | |
| **A computer vision app that suggests recipes and dishes based on visual similarity using Google's SigLIP model.** | |
| ## π― Project Overview | |
| This project builds a **Visual Search Engine** for food. Instead of relying on text labels (which can be inaccurate or missing), we use **Vector Embeddings** to find dishes that look similar. | |
| **Key Features:** | |
| * **Multimodal Search:** Find food using an image *or* a text description. | |
| * **Advanced Data Cleaning:** Automated detection of blurry or low-quality images. | |
| * **Model Comparison:** A scientific comparison between **OpenAI CLIP** and **Google SigLIP** to choose the best engine. | |
| **Live Demo:** [Click "App" tab above to view] | |
| --- | |
| ## π οΈ Tech Stack | |
| * **Model:** Google SigLIP (`google/siglip-base-patch16-224`) | |
| * **Frameworks:** PyTorch, Transformers, Gradio, Datasets | |
| * **Data Engineering:** OpenCV (Feature Extraction), NumPy | |
| * **Data Storage:** Parquet (via Git LFS) | |
| * **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA) | |
| --- | |
| ## π Part 1: Data Analysis & Cleaning | |
| **Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101) (Subset of 5,000 images). | |
| ### 1. Exploratory Data Analysis (EDA) | |
| Before any modeling, we analyzed the raw data to ensure quality and balance. | |
| * **Class Balance Check:** We verified that our random subset of 5,000 images maintained a healthy distribution across the 101 food categories (approx. 50 images per class). | |
| * **Image Dimensions:** We visualized the width and height distribution to identify unusually small or large images. | |
| * **Outlier Detection:** We plotted the distribution of **Aspect Ratios** and **Brightness Levels**. | |
|  | |
|  | |
|  | |
| ### 2. Data Cleaning | |
| Based on the plots above, **we deleted "bad" images** that were: | |
| * Too Dark (Avg Pixel Intensity < 20) | |
| * Too Bright/Washed out (Avg Pixel Intensity > 245) | |
| * Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0) | |
| --- | |
| ## βοΈ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip) | |
| To ensure the best search results, we ran a "Challenger" test between three leading multimodal models. | |
| ### The Contestants: | |
| 1. **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`) | |
| 2. **Challenger:** Google SigLIP (`siglip-base-patch16-224`) | |
| 3. **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m) | |
| ### The Evaluation: | |
| We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes). | |
| * **Metric:** Silhouette Score | |
| * **Winner:** **Google SigLIP** (Produced cleaner, more distinct clusters and better visual matches). | |
| **Visual Comparison:** | |
| We queried both models with the same image to see which returned more accurate similar foods. | |
|  | |
| --- | |
| ## π§ Part 3: Embeddings & Clustering | |
| Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts. | |
| * **Algorithm:** K-Means Clustering (k=101 categories). | |
| * **Visualization:** | |
| * **PCA:** To see the global variance. | |
| * **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers"). | |
|  | |
| --- | |
| ## π Part 4: The Application | |
| The final product is a **Gradio** web application hosted on Hugging Face Spaces. | |
| 1. **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches. | |
| 2. **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description. | |
| ## Note | |
| The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier | |
| ### How to Run Locally | |
| 1. **Clone the repository:** | |
| ```bash | |
| git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match) | |
| cd Food-Match | |
| ``` | |
| 2. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **Run the app:** | |
| ```bash | |
| python app.py | |
| ``` | |
| --- | |
| ## π Repository Structure | |
| * `app.py`: Main application logic (Gradio + SigLIP). | |
| * `food_embeddings_siglip.parquet`: Pre-computed SigLIP vector database. | |
| * `requirements.txt`: Python dependencies (includes `sentencepiece`, `protobuf`). | |
| * `README.md`: Project documentation. | |
| --- | |
| ## βοΈ Authors | |
| **Matan Kriel** | |
| **Odeya Shmuel** | |
| *Assignment #3: Embeddings, RecSys, and Spaces* |