Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -59,24 +59,16 @@ Based on the plots above, **we deleted "bad" images** that were:
|
|
| 59 |
* Too Bright/Washed out (Avg Pixel Intensity > 245)
|
| 60 |
* Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
|
| 61 |
|
| 62 |
-
### 3. Advanced Feature Engineering
|
| 63 |
-
After removing the garbage data, we engineered deeper visual features to assess image content:
|
| 64 |
-
|
| 65 |
-
* **Sharpness Score:** Used Laplacian Variance to find blurry photos.
|
| 66 |
-
* **Dominant Color (Hue):** Analyzed color clusters (e.g., Green for Salads vs. Red for Pizza).
|
| 67 |
-
* **Texture Complexity:** Calculated pixel standard deviation to distinguish smooth vs. complex foods.
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-

|
| 71 |
|
| 72 |
---
|
| 73 |
|
| 74 |
-
## ⚔️ Part 2: Model Comparison (CLIP vs. SigLIP)
|
| 75 |
-
To ensure the best search results, we ran a "Challenger" test between
|
| 76 |
|
| 77 |
### The Contestants:
|
| 78 |
1. **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
|
| 79 |
2. **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
|
|
|
|
| 80 |
|
| 81 |
### The Evaluation:
|
| 82 |
We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
|
|
@@ -88,12 +80,13 @@ We compared them using **Silhouette Scores** (measuring how distinct the food cl
|
|
| 88 |
We queried both models with the same image to see which returned more accurate similar foods.
|
| 89 |
|
| 90 |
|
| 91 |
-
|
|
|
|
| 92 |
|
| 93 |
---
|
| 94 |
|
| 95 |
## 🧠 Part 3: Embeddings & Clustering
|
| 96 |
-
Using the winning model (**SigLIP**),
|
| 97 |
|
| 98 |
* **Algorithm:** K-Means Clustering (k=101 categories).
|
| 99 |
* **Visualization:**
|
|
@@ -101,10 +94,8 @@ Using the winning model (**SigLIP**), we generated 768-dimensional vectors for t
|
|
| 101 |
* **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").
|
| 102 |
|
| 103 |
|
| 104 |
-

|
| 105 |
-
|
| 106 |
|
| 107 |
-
 -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
|
| 115 |
2. **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.
|
| 116 |
|
|
|
|
|
|
|
|
|
|
| 117 |
### How to Run Locally
|
| 118 |
1. **Clone the repository:**
|
| 119 |
```bash
|
| 120 |
-
git clone [https://huggingface.co/spaces/YOUR_USERNAME/
|
| 121 |
cd Food-Match
|
| 122 |
```
|
| 123 |
2. **Install dependencies:**
|
|
|
|
| 59 |
* Too Bright/Washed out (Avg Pixel Intensity > 245)
|
| 60 |
* Extreme Aspect Ratios (Too stretched or squashed, AR > 3.0)
|
| 61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
---
|
| 64 |
|
| 65 |
+
## ⚔️ Part 2: Model Comparison (CLIP vs. SigLIP vs metaclip)
|
| 66 |
+
To ensure the best search results, we ran a "Challenger" test between three leading multimodal models.
|
| 67 |
|
| 68 |
### The Contestants:
|
| 69 |
1. **Baseline:** OpenAI CLIP (`clip-vit-base-patch32`)
|
| 70 |
2. **Challenger:** Google SigLIP (`siglip-base-patch16-224`)
|
| 71 |
+
3. **Challenger:** Facebook MetaCLIP": ("facebook/metaclip-b32-400m)
|
| 72 |
|
| 73 |
### The Evaluation:
|
| 74 |
We compared them using **Silhouette Scores** (measuring how distinct the food clusters are) and a visual "Taste Test" (checking nearest neighbors for specific dishes).
|
|
|
|
| 80 |
We queried both models with the same image to see which returned more accurate similar foods.
|
| 81 |
|
| 82 |
|
| 83 |
+
|
| 84 |
+

|
| 85 |
|
| 86 |
---
|
| 87 |
|
| 88 |
## 🧠 Part 3: Embeddings & Clustering
|
| 89 |
+
Using the winning model (**SigLIP**), We applied dimensionality reduction to visualize how the AI groups food concepts.
|
| 90 |
|
| 91 |
* **Algorithm:** K-Means Clustering (k=101 categories).
|
| 92 |
* **Visualization:**
|
|
|
|
| 94 |
* **t-SNE:** To see local groupings (e.g., "Sushi" clusters separately from "Burgers").
|
| 95 |
|
| 96 |
|
|
|
|
|
|
|
| 97 |
|
| 98 |
+

|
| 99 |
|
| 100 |
---
|
| 101 |
|
|
|
|
| 105 |
1. **Image-to-Image:** Upload a photo (e.g., a burger) -> The app embeds it using SigLIP -> Finds the nearest 3 visual matches.
|
| 106 |
2. **Text-to-Image:** Type "Spicy Tacos" -> The app finds images matching that description.
|
| 107 |
|
| 108 |
+
## Note
|
| 109 |
+
The application is running the clip model even though the sigLip model won, sigLip was to big to be run on the hugging face space free tier
|
| 110 |
+
|
| 111 |
### How to Run Locally
|
| 112 |
1. **Clone the repository:**
|
| 113 |
```bash
|
| 114 |
+
git clone [https://huggingface.co/spaces/YOUR_USERNAME/Food_Recommender](https://huggingface.co/spaces/YOUR_USERNAME/Food-Match)
|
| 115 |
cd Food-Match
|
| 116 |
```
|
| 117 |
2. **Install dependencies:**
|