Spaces:

MatanKriel
/

food-model

Runtime error

App Files Files Community

Matan Kriel commited on Dec 30, 2025

Commit

b6d47fe

1 Parent(s): 13f1929

added files

Browse files

Files changed (4) hide show

README.md +112 -6
app.py +113 -0
food_embeddings.parquet +3 -0
requirements.txt +8 -0

README.md CHANGED Viewed

@@ -1,12 +1,118 @@
 ---
-title: Food Model
-emoji: 👁
-colorFrom: red
-colorTo: blue
 sdk: gradio
-sdk_version: 6.2.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Food Match
+emoji: 🌍
+colorFrom: pink
+colorTo: green
 sdk: gradio
+sdk_version: 6.1.0
 app_file: app.py
 pinned: false
+license: mit
+short_description: Trained model to detect and recommend similar foods.
+---
+# 🍔 Visual Dish Matcher AI
+**A computer vision app that suggests dishes based on visual/text similarity.**
+## 🎯 Project Overview
+This project explores the power of **Vector Embeddings** in building recommendation systems. Unlike traditional filters (e.g., "Show me Italian food"), this app uses **OpenAI's CLIP model** to "see" the food. It converts images into mathematical vectors and finds matches based on visual content—texture, color, shape, and ingredients.
+**Live Demo:** [Click "App" tab above to view]
+---
+## 🛠️ Tech Stack
+* **Model:** OpenAI CLIP (`clip-vit-base-patch32`)
+* **Frameworks:** PyTorch, Transformers, Datasets (Hugging Face)
+* **Interface:** Gradio
+* **Data Storage:** Parquet (via Git LFS)
+* **Visualization:** Matplotlib, Seaborn, Scikit-Learn (t-SNE/PCA)
+---
+## DataSet - Food101
+Food-101 dataset, a popular benchmark for fine-grained image classification. Unlike "clean" studio datasets, Food-101 contains real-world images taken in various lighting conditions, angles, and noise levels, making it highly representative of photos users typically upload to social media or food apps.
+Key Features:
+101 Categories: Covers a wide range of international dishes, including Sushi, Pizza, Hamburger, Pad Thai, Baklava, and Chocolate Mousse.
+"In the Wild" Data: Images are not perfectly centered or lit; they contain background noise (plates, cutlery, restaurant tables), challenging the model to focus on the food itself.
+Project Subset: To ensure computational efficiency for this assignment, a randomized stratified subset of 5,000 images was selected from the training split.
+Data Structure:
+Input: RGB Images (various aspect ratios, resized during processing).
+Labels: 101 unique Integer IDs mapped to human-readable Class Names.
+---
+## 📊 Part 1: Data Exploration (EDA)
+**Dataset:** [Food-101 (ETH Zurich)](https://huggingface.co/datasets/ethz/food101)
+To ensure computational efficiency for the assignment, I utilized a randomized subset of **5,000 images** spanning 101 categories.
+### 1. Data Cleaning
+Before training, the dataset underwent rigorous cleaning:
+* **Format Correction:** Converted distinct Grayscale images to RGB to ensure compatibility with the CLIP model.
+* **Outlier Detection:** Analyzed image brightness and aspect ratios to identify and flag low-quality or distorted images (e.g., pitch-black photos or extreme panoramas).
+### 2. Image Distribution
+We verified the class balance to ensure the model wasn't biased toward specific categories.
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/eyW5Q6CEQJyLLzAcMl_pi.png)
+### 3. Dimensionality Analysis
+We analyzed the width vs. height of the dataset to verify that most images were standard sizes suitable for resizing.
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/OsJzdqlph74L5os7RuUPK.png)
+###  Outlier Detection
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/JV5iLSjmnLgtVRWiApKYs.png)
+---
+## 🧠 Part 2: Embeddings & Clustering
+The core of the "Visual Matcher" is the embedding space. We generated 512-dimensional vectors for every image in the training set.
+### Clustering Analysis
+Using **K-Means**, we grouped these vectors to see if the model could automatically discover food categories without being told the labels.
+* **Algorithm:** K-Means (k=50)
+* **Dimensionality Reduction:** t-SNE (to visualize 512D vectors in 2D)
+**Key Insight:** The model successfully grouped foods by visual properties. For example, "Red/Orange" foods (Pizza, Lasagna) formed distinct clusters separate from "Green" foods (Salads, Guacamole).
+![image](https://cdn-uploads.huggingface.co/production/uploads/67dfcd96d01eab4618a66f78/ZqA16HEHvfXKzNYsyTgdz.png)
+---
+## 🚀 Part 3: The Application
+The final product is a **Gradio** web application hosted on Hugging Face Spaces. It supports two modes of interaction:
+1.  **Image-to-Image:** The user uploads a photo (e.g., a burger). The app embeds the upload and calculates **Cosine Similarity** against the database to find the nearest visual neighbors.
+2.  **Text-to-Image:** The user types a description (e.g., "Spicy Tacos"). The app uses CLIP's text encoder to find images that match the semantic meaning of the text.
+---
+## 📂 Repository Structure
+* `app.py`: Main application logic and Gradio interface.
+* `food_embeddings.parquet`: Pre-computed vector database (stored via Git LFS).
+* `requirements.txt`: Python dependencies.
+* `README.md`: Project documentation.
+---
+## ✍️ Author
+**[Matan Kriel]**
+*Assignment #3: Embeddings, RecSys, and Spaces*
 ---

app.py ADDED Viewed

	@@ -0,0 +1,113 @@

+import gradio as gr
+import torch
+import pandas as pd
+import numpy as np
+from PIL import Image
+from transformers import CLIPProcessor, CLIPModel
+from datasets import load_dataset
+from torch.nn import functional as F
+# --- 1. SETUP & CONFIG ---
+MODEL_ID = "openai/clip-vit-base-patch32"
+DATA_FILE = "food_embeddings.parquet"
+print("⏳ Starting App... Loading Model...")
+# Load Model (CPU is fine for inference on single images)
+model = CLIPModel.from_pretrained(MODEL_ID)
+processor = CLIPProcessor.from_pretrained(MODEL_ID)
+# --- 2. LOAD DATA (Must match Colab logic EXACTLY) ---
+print("⏳ Loading Dataset (this takes a moment)...")
+# We load the same 5000 images using the same seed so indices match the parquet file
+dataset = load_dataset("ethz/food101", split="train").shuffle(seed=42).select(range(5000))
+# --- 3. LOAD EMBEDDINGS ---
+print("⏳ Loading Pre-computed Embeddings...")
+df = pd.read_parquet(DATA_FILE)
+# Convert the list of numbers in the parquet back to a Torch Tensor
+db_features = torch.tensor(np.stack(df['embedding'].to_numpy()))
+# Normalize once for speed
+db_features = F.normalize(db_features, p=2, dim=1)
+print("✅ App Ready!")
+# --- 4. CORE SEARCH LOGIC ---
+def find_best_matches(query_features, top_k=3):
+    # Normalize query
+    query_features = F.normalize(query_features, p=2, dim=1)
+    # Calculate Similarity (Dot Product)
+    # Query (1x512) * DB (5000x512) = Scores (1x5000)
+    similarity = torch.mm(query_features, db_features.T)
+    # Get Top K
+    scores, indices = torch.topk(similarity, k=top_k)
+    results = []
+    for idx, score in zip(indices[0], scores[0]):
+        idx = idx.item()
+        # Grab image and info from the loaded dataset
+        img = dataset[idx]['image']
+        label = df.iloc[idx]['label_name'] # Get label from our dataframe
+        # Format output
+        results.append((img, f"{label} ({score:.2f})"))
+    return results
+# --- 5. GRADIO FUNCTIONS ---
+def search_by_image(input_image):
+    if input_image is None: return []
+    inputs = processor(images=input_image, return_tensors="pt")
+    with torch.no_grad():
+        features = model.get_image_features(**inputs)
+    return find_best_matches(features)
+def search_by_text(input_text):
+    if not input_text: return []
+    inputs = processor(text=[input_text], return_tensors="pt", padding=True)
+    with torch.no_grad():
+        features = model.get_text_features(**inputs)
+    return find_best_matches(features)
+# --- 6. BUILD UI ---
+with gr.Blocks(title="Food Matcher AI") as demo:
+    gr.Markdown("# 🍔 Visual Dish Matcher")
+    gr.Markdown("Upload a photo of food (or describe it) to find similar dishes in our database.")
+    # --- VIDEO SECTION ---
+    # Using Accordion so it doesn't clutter the UI. Open=False means it starts closed.
+    with gr.Accordion("📺 Watch Project Demo", open=False):
+        gr.HTML("""
+            <div style="display: flex; justify-content: center;">
+                <iframe width="560" height="315"
+                    src="https://www.youtube.com/watch?v=Al665qltkDg&t=4s"
+                    title="YouTube video player"
+                    frameborder="0"
+                    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+                    allowfullscreen>
+                </iframe>
+            </div>
+        """)
+    # ----------------------------
+    with gr.Tab("Image Search"):
+        with gr.Row():
+            img_input = gr.Image(type="pil", label="Upload Food Image")
+            img_gallery = gr.Gallery(label="Top Matches")
+        btn_img = gr.Button("Find Similar Dishes")
+        btn_img.click(search_by_image, inputs=img_input, outputs=img_gallery)
+    with gr.Tab("Text Search"):
+        with gr.Row():
+            txt_input = gr.Textbox(label="Describe the food (e.g., 'Spicy Tacos')")
+            txt_gallery = gr.Gallery(label="Top Matches")
+        btn_txt = gr.Button("Search by Description")
+        btn_txt.click(search_by_text, inputs=txt_input, outputs=txt_gallery)
+# Launch (Disable SSR for stability)
+demo.launch(ssr_mode=False)

food_embeddings.parquet ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:87bbf48a556dd11a473e2e168ef6f94cd9bc150ccabbcea9dda084b2cb9ca3b9
+size 8828792

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+gradio
+torch
+transformers
+pandas
+numpy
+datasets
+pyarrow
+scikit-learn