AAZ1215 commited on
Commit
f91767e
Β·
verified Β·
1 Parent(s): 3730b80

Upload 3 files

Browse files
Files changed (3) hide show
  1. DEPLOYMENT_README.md +269 -0
  2. gradio_app_deploy.py +473 -0
  3. requirements.txt +7 -0
DEPLOYMENT_README.md ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸš€ Group 5 Pattern Recognition Project - Deployment Guide
2
+
3
+ ## πŸ“– Overview
4
+ This is a recipe recommendation system using semantic search with a trained BERT model. The system provides intelligent recipe recommendations based on semantic understanding of user queries.
5
+
6
+ ## 🌐 Live Demo
7
+ Deploy this app on **Hugging Face Spaces** for free hosting!
8
+
9
+ ## πŸ“ File Setup for Deployment
10
+
11
+ ### Step 1: Upload Large Files to Google Drive
12
+
13
+ You need to upload these files to Google Drive and make them publicly accessible:
14
+
15
+ 1. **torch_recipe_embeddings_231630.pt** (679MB)
16
+ 2. **tag_based_bert_model.pth** (418MB)
17
+ 3. **RAW_recipes.csv** (281MB)
18
+ 4. **recipe_statistics_231630.pkl** (4.3MB)
19
+ 5. **recipe_scores_231630.pkl** (3.0MB)
20
+
21
+ ### Step 2: Get Google Drive File IDs
22
+
23
+ For each file in Google Drive:
24
+ 1. Right-click β†’ "Get link"
25
+ 2. Make sure it's set to "Anyone with the link can view"
26
+ 3. Copy the file ID from the URL: `https://drive.google.com/file/d/FILE_ID_HERE/view`
27
+
28
+ ### Step 3: Update File IDs in Code
29
+
30
+ Edit `gradio_app_deploy.py` and replace the placeholder IDs:
31
+
32
+ ```python
33
+ GOOGLE_DRIVE_FILES = {
34
+ 'torch_recipe_embeddings_231630.pt': 'YOUR_ACTUAL_EMBEDDINGS_FILE_ID',
35
+ 'tag_based_bert_model.pth': 'YOUR_ACTUAL_MODEL_FILE_ID',
36
+ 'RAW_recipes.csv': 'YOUR_ACTUAL_RECIPES_FILE_ID',
37
+ 'recipe_statistics_231630.pkl': 'YOUR_ACTUAL_STATS_FILE_ID',
38
+ 'recipe_scores_231630.pkl': 'YOUR_ACTUAL_SCORES_FILE_ID'
39
+ }
40
+ ```
41
+
42
+ ## πŸ€— Deploy to Hugging Face Spaces
43
+
44
+ ### Step 1: Create Hugging Face Account
45
+ 1. Go to [huggingface.co](https://huggingface.co)
46
+ 2. Sign up for a free account
47
+
48
+ ### Step 2: Create New Space
49
+ 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
50
+ 2. Click "Create new Space"
51
+ 3. Choose:
52
+ - **Space name**: `group5-recipe-recommendation`
53
+ - **License**: Apache 2.0
54
+ - **SDK**: Gradio
55
+ - **Hardware**: CPU Basic (free)
56
+
57
+ ### Step 3: Upload Files
58
+ Upload these files to your Space:
59
+
60
+ ```
61
+ πŸ“ Your Space Repository
62
+ β”œβ”€β”€ app.py (rename gradio_app_deploy.py to app.py)
63
+ β”œβ”€β”€ requirements.txt (use requirements_deploy.txt)
64
+ └── README.md (this file)
65
+ ```
66
+
67
+ ### Step 4: Files to Upload
68
+
69
+ 1. **Rename** `gradio_app_deploy.py` β†’ `app.py`
70
+ 2. **Rename** `requirements_deploy.txt` β†’ `requirements.txt`
71
+ 3. **Upload** both files to your Space
72
+
73
+ ### Step 5: Configure Space
74
+ Your Space will automatically:
75
+ 1. Install dependencies from `requirements.txt`
76
+ 2. Download files from Google Drive on first run
77
+ 3. Start the Gradio app on port 7860
78
+
79
+ ## πŸ”§ Alternative Deployment Options
80
+
81
+ ### Option 1: Railway
82
+ 1. Connect your GitHub repo to [Railway](https://railway.app)
83
+ 2. Add environment variables for file URLs
84
+ 3. Deploy with automatic builds
85
+
86
+ ### Option 2: Render
87
+ 1. Connect your GitHub repo to [Render](https://render.com)
88
+ 2. Configure build and start commands
89
+ 3. Set up environment variables
90
+
91
+ ### Option 3: Streamlit Cloud
92
+ 1. Convert the app to Streamlit format
93
+ 2. Deploy via [streamlit.io](https://streamlit.io)
94
+
95
+ ## πŸ“Š Expected Performance
96
+ - **Startup Time**: 2-5 minutes (downloading files)
97
+ - **Search Speed**: <2 seconds per query
98
+ - **Memory Usage**: ~2GB (for full dataset)
99
+ - **Storage**: ~1.5GB total
100
+
101
+ ## πŸ› Troubleshooting
102
+
103
+ ### Common Issues:
104
+
105
+ 1. **Files not downloading**
106
+ - Check Google Drive file permissions
107
+ - Verify file IDs are correct
108
+ - Ensure files are public
109
+
110
+ 2. **Out of memory**
111
+ - Use smaller dataset subset
112
+ - Upgrade to paid Hugging Face hardware
113
+
114
+ 3. **Slow startup**
115
+ - Normal for first run (downloading files)
116
+ - Subsequent runs will be faster
117
+
118
+ ## πŸ”— Useful Links
119
+ - [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces)
120
+ - [Gradio Documentation](https://gradio.app/docs)
121
+ - [PyTorch Documentation](https://pytorch.org/docs)
122
+
123
+ ## πŸ”„ GitHub Integration & Auto-Sync
124
+
125
+ ### Option 1: Direct GitHub Connection (Recommended)
126
+
127
+ 1. **In your Hugging Face Space settings**:
128
+ - Go to your Space β†’ Settings β†’ Repository
129
+ - Click "Connect to GitHub"
130
+ - Authorize Hugging Face to access your GitHub repo
131
+ - Select your repository: `PatternRec_Project_Group7`
132
+
133
+ 2. **Configure auto-sync**:
134
+ - Enable "Auto-sync with GitHub"
135
+ - Choose branch (usually `main`)
136
+ - Set sync frequency (immediate, hourly, daily)
137
+
138
+ 3. **Result**: Every time you push to GitHub, your Hugging Face Space will automatically update!
139
+
140
+ ### Option 2: GitHub Actions (Advanced)
141
+
142
+ Create `.github/workflows/deploy-to-hf.yml` in your repo:
143
+
144
+ ```yaml
145
+ name: Deploy to Hugging Face Spaces
146
+
147
+ on:
148
+ push:
149
+ branches: [ main ]
150
+ pull_request:
151
+ branches: [ main ]
152
+
153
+ jobs:
154
+ deploy:
155
+ runs-on: ubuntu-latest
156
+ steps:
157
+ - uses: actions/checkout@v3
158
+
159
+ - name: Push to Hugging Face Spaces
160
+ env:
161
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
162
+ run: |
163
+ git config --global user.email "action@github.com"
164
+ git config --global user.name "GitHub Action"
165
+
166
+ # Clone your HF Space repo
167
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/group5-recipe-recommendation hf-space
168
+ cd hf-space
169
+
170
+ # Copy files from GitHub repo
171
+ cp ../gradio_app_deploy.py ./app.py
172
+ cp ../requirements_deploy.txt ./requirements.txt
173
+ cp ../DEPLOYMENT_README.md ./README.md
174
+
175
+ # Push to HF Space
176
+ git add .
177
+ git commit -m "Auto-sync from GitHub: ${{ github.event.head_commit.message }}"
178
+ git push https://USER:${{ secrets.HF_TOKEN }}@huggingface.co/spaces/YOUR_USERNAME/group5-recipe-recommendation
179
+ ```
180
+
181
+ ### Option 3: Dual Git Remotes
182
+
183
+ Set up your local repo to push to both GitHub and Hugging Face:
184
+
185
+ ```bash
186
+ # Add HF Space as second remote
187
+ git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/group5-recipe-recommendation
188
+
189
+ # Push to both with one command
190
+ git push origin main # GitHub
191
+ git push hf main # Hugging Face Space
192
+
193
+ # Or create an alias for both
194
+ git config alias.pushall '!git push origin main && git push hf main'
195
+ # Then use: git pushall
196
+ ```
197
+
198
+ ### Option 4: Automated Script
199
+
200
+ Create a deployment script `deploy.sh`:
201
+
202
+ ```bash
203
+ #!/bin/bash
204
+ echo "πŸš€ Deploying to Hugging Face Space..."
205
+
206
+ # Copy deployment files
207
+ cp gradio_app_deploy.py app.py
208
+ cp requirements_deploy.txt requirements.txt
209
+
210
+ # Commit changes
211
+ git add app.py requirements.txt README.md
212
+ git commit -m "Deploy: $(date)"
213
+
214
+ # Push to GitHub
215
+ git push origin main
216
+
217
+ # Push to Hugging Face Space
218
+ git push hf main
219
+
220
+ echo "βœ… Deployment complete!"
221
+ ```
222
+
223
+ ### Recommended Workflow
224
+
225
+ 1. **Set up direct GitHub connection** (easiest)
226
+ 2. **Structure your repo** with deployment-ready files:
227
+ ```
228
+ πŸ“ Your GitHub Repo
229
+ β”œβ”€β”€ gradio_app_deploy.py # Main app (will become app.py)
230
+ β”œβ”€β”€ requirements_deploy.txt # Dependencies (will become requirements.txt)
231
+ β”œβ”€β”€ DEPLOYMENT_README.md # This file (will become README.md)
232
+ β”œβ”€β”€ gradio_app_fixed.py # Development version
233
+ └── ... other project files
234
+ ```
235
+
236
+ 3. **Configure auto-sync** in HF Space settings
237
+ 4. **Push to GitHub** - HF Space updates automatically!
238
+
239
+ ### File Mapping for Auto-Sync
240
+
241
+ When files sync from GitHub β†’ Hugging Face Space:
242
+
243
+ | GitHub File | β†’ | HF Space File | Purpose |
244
+ |-------------|---|---------------|---------|
245
+ | `gradio_app_deploy.py` | β†’ | `app.py` | Main application |
246
+ | `requirements_deploy.txt` | β†’ | `requirements.txt` | Dependencies |
247
+ | `DEPLOYMENT_README.md` | β†’ | `README.md` | Documentation |
248
+
249
+ ### Benefits of GitHub Integration
250
+
251
+ βœ… **Version Control**: Keep your code in GitHub
252
+ βœ… **Automatic Updates**: Push once, deploy everywhere
253
+ βœ… **Collaboration**: Team members can contribute via GitHub
254
+ βœ… **Backup**: Multiple copies of your code
255
+ βœ… **CI/CD**: Run tests before deployment
256
+
257
+ ## πŸ†˜ Support
258
+ If you encounter issues:
259
+ 1. Check the Space logs in Hugging Face
260
+ 2. Verify all file IDs are correct
261
+ 3. Ensure requirements.txt has all dependencies
262
+
263
+ ## 🎯 Success Criteria
264
+ βœ… App loads without errors
265
+ βœ… Search functionality works
266
+ βœ… Results show relevant recipes
267
+ βœ… Interface is responsive
268
+
269
+ Your app should be accessible at: `https://huggingface.co/spaces/YOUR_USERNAME/group5-recipe-recommendation`
gradio_app_deploy.py ADDED
@@ -0,0 +1,473 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Group 5 Pattern Recognition Project - Deployment Version
4
+ =======================================================
5
+
6
+ Recipe Recommendation System with Google Drive file loading for deployment.
7
+ Optimized for Hugging Face Spaces or similar platforms.
8
+ """
9
+
10
+ import gradio as gr
11
+ import torch
12
+ from transformers import BertTokenizer, BertModel
13
+ import pickle
14
+ import os
15
+ import csv
16
+ from typing import List, Dict
17
+ import time
18
+ import ast
19
+ import requests
20
+ import gdown
21
+ from pathlib import Path
22
+
23
+ # Google Drive file IDs (you'll need to replace these with your actual file IDs)
24
+ GOOGLE_DRIVE_FILES = {
25
+ 'torch_recipe_embeddings_231630.pt': '1PSidY1toSfgECXDxa4pGza56Jq6vOq6t',
26
+ 'tag_based_bert_model.pth': '1LBl7yFs5JFqOsgfn88BF9g83W9mxiBm6',
27
+ 'RAW_recipes.csv': '1rFJQzg_ErwEpN6WmhQ4jRyiXv6JCINyf',
28
+ 'recipe_statistics_231630.pkl': '1n8TNT-6EA_usv59CCCU1IXqtuM7i084E',
29
+ 'recipe_scores_231630.pkl': '1gfPBzghKHOZqgJu4VE9NkandAd6FGjrA'
30
+ }
31
+
32
+ def download_file_from_drive(file_id: str, destination: str) -> bool:
33
+ """Download file from Google Drive"""
34
+ try:
35
+ print(f"πŸ“₯ Downloading {destination}...")
36
+ url = f"https://drive.google.com/uc?id={file_id}"
37
+ gdown.download(url, destination, quiet=False)
38
+ return True
39
+ except Exception as e:
40
+ print(f"❌ Error downloading {destination}: {e}")
41
+ return False
42
+
43
+ def ensure_files_downloaded():
44
+ """Ensure all required files are downloaded from Google Drive"""
45
+ print("πŸ” Checking required files...")
46
+
47
+ for filename, file_id in GOOGLE_DRIVE_FILES.items():
48
+ if not os.path.exists(filename):
49
+ if file_id == 'YOUR_EMBEDDINGS_FILE_ID_HERE':
50
+ print(f"⚠️ {filename} not configured for download")
51
+ continue
52
+
53
+ print(f"πŸ“₯ Downloading {filename} from Google Drive...")
54
+ success = download_file_from_drive(file_id, filename)
55
+ if not success:
56
+ print(f"❌ Failed to download {filename}")
57
+ return False
58
+
59
+ print("βœ… All files ready!")
60
+ return True
61
+
62
+ class DeployableRecipeSearch:
63
+ """
64
+ Deployment-ready recipe search system
65
+ """
66
+
67
+ def __init__(self):
68
+ print("πŸš€ Initializing Recipe Search System...")
69
+
70
+ self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
71
+ print(f"πŸ“± Device: {self.device}")
72
+
73
+ # Ensure files are downloaded
74
+ if not ensure_files_downloaded():
75
+ print("❌ Failed to download required files")
76
+ self.is_ready = False
77
+ return
78
+
79
+ # Load tokenizer and model
80
+ self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
81
+ self.model = BertModel.from_pretrained('bert-base-uncased')
82
+
83
+ # Load trained model if available
84
+ if os.path.exists('tag_based_bert_model.pth'):
85
+ print("🧠 Loading trained BERT model...")
86
+ self.model.load_state_dict(torch.load('tag_based_bert_model.pth', map_location=self.device))
87
+ print("βœ… Trained model loaded!")
88
+ else:
89
+ print("⚠️ Using pre-trained BERT")
90
+
91
+ self.model.to(self.device)
92
+ self.model.eval()
93
+
94
+ # Load data
95
+ self.load_data()
96
+
97
+ print("πŸŽ‰ Recipe Search System ready!")
98
+
99
+ def safe_literal_eval(self, text):
100
+ """Safely evaluate string representations of lists"""
101
+ if not text or text == 'nan' or str(text).lower() == 'nan':
102
+ return []
103
+ try:
104
+ if isinstance(text, str) and text.startswith('[') and text.endswith(']'):
105
+ return ast.literal_eval(text)
106
+ elif isinstance(text, str):
107
+ return [item.strip() for item in text.split(',') if item.strip()]
108
+ elif isinstance(text, list):
109
+ return text
110
+ else:
111
+ return []
112
+ except:
113
+ return []
114
+
115
+ def safe_int(self, value):
116
+ """Safely convert value to int"""
117
+ try:
118
+ return int(float(value))
119
+ except:
120
+ return 0
121
+
122
+ def load_data(self):
123
+ """Load all required data"""
124
+
125
+ # Load PyTorch embeddings
126
+ embeddings_file = 'torch_recipe_embeddings_231630.pt'
127
+ if os.path.exists(embeddings_file):
128
+ print(f"πŸ“₯ Loading embeddings...")
129
+ self.recipe_embeddings = torch.load(embeddings_file, map_location=self.device)
130
+ print(f"βœ… Loaded {self.recipe_embeddings.shape[0]} embeddings")
131
+ else:
132
+ print(f"❌ Embeddings not found")
133
+ self.is_ready = False
134
+ return
135
+
136
+ # Load recipes from CSV
137
+ self.load_recipes_from_csv()
138
+
139
+ # Load statistics and scores
140
+ self.load_statistics_and_scores()
141
+
142
+ # Check if we have everything we need
143
+ self.is_ready = all([
144
+ self.recipe_embeddings is not None,
145
+ len(self.recipes) > 0,
146
+ len(self.recipe_stats) > 0,
147
+ len(self.recipe_scores) > 0
148
+ ])
149
+
150
+ if self.is_ready:
151
+ self.fix_recipe_id_mismatches()
152
+ print("🎯 All data loaded successfully!")
153
+ else:
154
+ print("⚠️ Some data missing")
155
+
156
+ def load_recipes_from_csv(self):
157
+ """Load and filter recipes from CSV"""
158
+ print("πŸ“Š Loading recipes from CSV...")
159
+ self.recipes = []
160
+
161
+ if os.path.exists('RAW_recipes.csv'):
162
+ valid_recipes = []
163
+
164
+ with open('RAW_recipes.csv', 'r', encoding='utf-8') as file:
165
+ csv_reader = csv.DictReader(file)
166
+
167
+ for row_idx, row in enumerate(csv_reader):
168
+ try:
169
+ # Apply filtering logic
170
+ name = row.get('name', '')
171
+ if not name or str(name).lower().strip() in ['', 'nan', 'unknown recipe']:
172
+ continue
173
+ name = str(name).lower().strip()
174
+
175
+ tags = self.safe_literal_eval(row.get('tags', '[]'))
176
+ ingredients = self.safe_literal_eval(row.get('ingredients', '[]'))
177
+
178
+ # Filter conditions
179
+ if not tags or len(tags) == 0:
180
+ continue
181
+ if not ingredients or len(ingredients) == 0:
182
+ continue
183
+ if len(name) == 0 or name == 'unknown recipe':
184
+ continue
185
+
186
+ recipe = {
187
+ 'id': int(row.get('id', row_idx)),
188
+ 'name': name,
189
+ 'minutes': self.safe_int(row.get('minutes', 0)),
190
+ 'tags': tags,
191
+ 'ingredients': ingredients,
192
+ 'n_steps': self.safe_int(row.get('n_steps', 0)),
193
+ 'description': str(row.get('description', '')).strip()
194
+ }
195
+
196
+ valid_recipes.append(recipe)
197
+
198
+ if len(valid_recipes) >= 231630:
199
+ break
200
+
201
+ except Exception as e:
202
+ continue
203
+
204
+ self.recipes = valid_recipes
205
+ print(f"βœ… Loaded {len(self.recipes)} recipes")
206
+ else:
207
+ print("❌ RAW_recipes.csv not found")
208
+ self.recipes = []
209
+
210
+ def load_statistics_and_scores(self):
211
+ """Load recipe statistics and scores"""
212
+ # Load statistics
213
+ stats_file = 'recipe_statistics_231630.pkl'
214
+ try:
215
+ if os.path.exists(stats_file):
216
+ with open(stats_file, 'rb') as f:
217
+ self.recipe_stats = pickle.load(f)
218
+ print(f"βœ… Loaded statistics for {len(self.recipe_stats)} recipes")
219
+ else:
220
+ self.recipe_stats = {}
221
+ for recipe in self.recipes:
222
+ self.recipe_stats[recipe['id']] = (4.0, 10, 5)
223
+ except Exception as e:
224
+ print(f"⚠️ Statistics loading failed: {e}")
225
+ self.recipe_stats = {}
226
+ for recipe in self.recipes:
227
+ self.recipe_stats[recipe['id']] = (4.0, 10, 5)
228
+
229
+ # Load scores
230
+ scores_file = 'recipe_scores_231630.pkl'
231
+ try:
232
+ if os.path.exists(scores_file):
233
+ with open(scores_file, 'rb') as f:
234
+ self.recipe_scores = pickle.load(f)
235
+ print(f"βœ… Loaded scores for {len(self.recipe_scores)} recipes")
236
+ else:
237
+ self.recipe_scores = {}
238
+ for recipe in self.recipes:
239
+ self.recipe_scores[recipe['id']] = 0.5
240
+ except Exception as e:
241
+ print(f"⚠️ Scores loading failed: {e}")
242
+ self.recipe_scores = {}
243
+ for recipe in self.recipes:
244
+ self.recipe_scores[recipe['id']] = 0.5
245
+
246
+ def fix_recipe_id_mismatches(self):
247
+ """Filter statistics and scores to match loaded recipes"""
248
+ loaded_recipe_ids = set(recipe['id'] for recipe in self.recipes)
249
+
250
+ # Filter statistics
251
+ original_stats_count = len(self.recipe_stats)
252
+ self.recipe_stats = {
253
+ recipe_id: stats for recipe_id, stats in self.recipe_stats.items()
254
+ if recipe_id in loaded_recipe_ids
255
+ }
256
+
257
+ # Filter scores
258
+ original_scores_count = len(self.recipe_scores)
259
+ self.recipe_scores = {
260
+ recipe_id: score for recipe_id, score in self.recipe_scores.items()
261
+ if recipe_id in loaded_recipe_ids
262
+ }
263
+
264
+ print(f"πŸ”§ Aligned data: Stats {original_stats_count}β†’{len(self.recipe_stats)}, Scores {original_scores_count}β†’{len(self.recipe_scores)}")
265
+
266
+ def search_recipes(self, query: str, num_results: int = 5, min_rating: float = 3.0) -> str:
267
+ """Search for recipes and return formatted HTML results"""
268
+
269
+ if not self.is_ready:
270
+ return """
271
+ <div style="color: red; padding: 20px; border: 1px solid red; border-radius: 5px;">
272
+ ❌ Search system not ready - files may still be downloading
273
+ </div>
274
+ """
275
+
276
+ if not query.strip():
277
+ return """
278
+ <div style="color: orange; padding: 20px; border: 1px solid orange; border-radius: 5px;">
279
+ ⚠️ Please enter a search query
280
+ </div>
281
+ """
282
+
283
+ try:
284
+ start_time = time.time()
285
+
286
+ # Tokenize query
287
+ inputs = self.tokenizer(
288
+ query, return_tensors='pt', truncation=True,
289
+ max_length=128, padding='max_length'
290
+ ).to(self.device)
291
+
292
+ # Get query embedding
293
+ with torch.no_grad():
294
+ outputs = self.model(**inputs)
295
+ query_embedding = outputs.last_hidden_state[:, 0, :].cpu().flatten()
296
+
297
+ # Calculate similarities
298
+ recipe_embeddings_normalized = torch.nn.functional.normalize(self.recipe_embeddings, p=2, dim=1)
299
+ query_embedding_normalized = torch.nn.functional.normalize(query_embedding.unsqueeze(0), p=2, dim=1)
300
+ similarities = torch.mm(recipe_embeddings_normalized, query_embedding_normalized.t()).flatten()
301
+
302
+ # Get top results
303
+ top_indices = torch.argsort(similarities, descending=True)[:num_results * 3]
304
+
305
+ results = []
306
+ for idx in top_indices:
307
+ if len(results) >= num_results:
308
+ break
309
+
310
+ embedding_idx = idx.item()
311
+ if embedding_idx < len(self.recipes):
312
+ recipe = self.recipes[embedding_idx]
313
+ recipe_id = recipe['id']
314
+
315
+ if recipe_id in self.recipe_stats:
316
+ avg_rating, num_ratings, unique_users = self.recipe_stats[recipe_id]
317
+
318
+ if avg_rating >= min_rating:
319
+ similarity_score = similarities[idx].item()
320
+ popularity_score = self.recipe_scores.get(recipe_id, 0.0)
321
+ combined_score = 0.7 * similarity_score + 0.3 * popularity_score
322
+
323
+ results.append({
324
+ 'name': recipe['name'],
325
+ 'ingredients': recipe['ingredients'][:8] if isinstance(recipe['ingredients'], list) else [],
326
+ 'tags': recipe['tags'][:6] if isinstance(recipe['tags'], list) else [],
327
+ 'minutes': recipe.get('minutes', 0),
328
+ 'n_steps': recipe.get('n_steps', 0),
329
+ 'similarity_score': similarity_score,
330
+ 'popularity_score': popularity_score,
331
+ 'combined_score': combined_score,
332
+ 'avg_rating': avg_rating,
333
+ 'num_ratings': num_ratings,
334
+ 'recipe_id': recipe_id
335
+ })
336
+
337
+ search_time = time.time() - start_time
338
+
339
+ if results:
340
+ return self.format_results(query, results, search_time)
341
+ else:
342
+ return f"""
343
+ <div style="color: orange; padding: 20px; border: 1px solid orange; border-radius: 5px;">
344
+ πŸ˜” No recipes found for "{query}" with rating β‰₯ {min_rating}
345
+ </div>
346
+ """
347
+
348
+ except Exception as e:
349
+ return f"""
350
+ <div style="color: red; padding: 20px; border: 1px solid red; border-radius: 5px;">
351
+ ❌ Search error: {str(e)}
352
+ </div>
353
+ """
354
+
355
+ def format_results(self, query: str, results: List[Dict], search_time: float) -> str:
356
+ """Format search results as HTML"""
357
+
358
+ html = f"""
359
+ <div style="margin-bottom: 20px;">
360
+ <h2 style="color: #2E8B57;">🎯 Found {len(results)} recipes for "{query}"</h2>
361
+ <p style="color: #666;">⚑ Search completed in {search_time:.2f}s</p>
362
+ </div>
363
+ """
364
+
365
+ for i, recipe in enumerate(results, 1):
366
+ ingredients = recipe['ingredients']
367
+ ingredients_text = ', '.join(ingredients) if ingredients else "No ingredients listed"
368
+ if len(ingredients_text) > 150:
369
+ ingredients_text = ingredients_text[:150] + "..."
370
+
371
+ tags = recipe['tags']
372
+ tags_html = ' '.join([f'<span style="background: #e3f2fd; padding: 2px 6px; border-radius: 12px; font-size: 0.8em; margin: 2px;">{tag}</span>' for tag in tags]) if tags else ""
373
+
374
+ time_text = f"{recipe['minutes']} min" if recipe['minutes'] > 0 else "Time not specified"
375
+
376
+ recipe_html = f"""
377
+ <div style="border: 1px solid #ddd; border-radius: 8px; padding: 15px; margin: 15px 0; background: linear-gradient(135deg, #f8f9fa, #ffffff);">
378
+ <h3 style="color: #1976d2; margin-bottom: 10px;">{i}. {recipe['name']}</h3>
379
+
380
+ <div style="margin: 8px 0;">
381
+ <strong>⏱️ {time_text}</strong> |
382
+ <strong>πŸ”₯ {recipe['n_steps']} steps</strong> |
383
+ <strong>⭐ {recipe['avg_rating']:.1f}/5.0</strong> ({recipe['num_ratings']} ratings)
384
+ </div>
385
+
386
+ <div style="margin: 8px 0;">
387
+ <span style="background: #4caf50; color: white; padding: 2px 8px; border-radius: 12px; font-size: 0.8em; margin-right: 5px;">
388
+ Match: {recipe['similarity_score']:.1%}
389
+ </span>
390
+ <span style="background: #ff9800; color: white; padding: 2px 8px; border-radius: 12px; font-size: 0.8em;">
391
+ Score: {recipe['combined_score']:.1%}
392
+ </span>
393
+ </div>
394
+
395
+ <div style="margin: 10px 0;">
396
+ {tags_html}
397
+ </div>
398
+
399
+ <div style="margin: 10px 0; color: #555;">
400
+ <strong>πŸ₯˜ Ingredients:</strong><br>
401
+ {ingredients_text}
402
+ </div>
403
+ </div>
404
+ """
405
+ html += recipe_html
406
+
407
+ return html
408
+
409
+ # Initialize the search system
410
+ print("πŸ”„ Initializing deployment-ready recipe search system...")
411
+ try:
412
+ search_system = DeployableRecipeSearch()
413
+ except Exception as e:
414
+ print(f"❌ Initialization failed: {e}")
415
+ search_system = None
416
+
417
+ def search_interface(query, num_results, min_rating):
418
+ """Gradio interface function"""
419
+ if search_system is None:
420
+ return "<div style='color: red;'>❌ System initialization failed</div>"
421
+ return search_system.search_recipes(query, int(num_results), float(min_rating))
422
+
423
+ # Create Gradio interface
424
+ with gr.Blocks(title="Group 5 Pattern Recognition Project", theme=gr.themes.Soft()) as demo:
425
+
426
+ gr.Markdown("""
427
+ # 🍽️ Group 5 Pattern Recognition Project
428
+ ### Advanced Recipe Recommendation using Semantic Search
429
+ """)
430
+
431
+ with gr.Row():
432
+ with gr.Column(scale=1):
433
+ query_input = gr.Textbox(
434
+ label="πŸ” Search for recipes",
435
+ placeholder="e.g., 'chicken pasta', 'vegetarian salad', 'chocolate dessert'",
436
+ lines=1
437
+ )
438
+
439
+ with gr.Row():
440
+ num_results = gr.Slider(1, 10, 5, step=1, label="Results")
441
+ min_rating = gr.Slider(1.0, 5.0, 3.0, step=0.1, label="Min Rating")
442
+
443
+ search_btn = gr.Button("Search Recipes", variant="primary")
444
+
445
+ # Example buttons
446
+ with gr.Row():
447
+ ex1 = gr.Button("πŸ— Chicken Pasta", size="sm")
448
+ ex2 = gr.Button("πŸ₯— Healthy Salad", size="sm")
449
+ ex3 = gr.Button("🍫 Chocolate Dessert", size="sm")
450
+
451
+ with gr.Column(scale=1):
452
+ results_output = gr.HTML("""
453
+ <div style="text-align: center; padding: 40px; color: #666;">
454
+ <h3>πŸ” Ready to Search</h3>
455
+ <p>Enter a search query and click "Search Recipes" to see results.</p>
456
+ </div>
457
+ """)
458
+
459
+ # Event handlers
460
+ search_btn.click(search_interface, [query_input, num_results, min_rating], results_output)
461
+ query_input.submit(search_interface, [query_input, num_results, min_rating], results_output)
462
+
463
+ # Example buttons
464
+ ex1.click(lambda: "chicken pasta", outputs=query_input)
465
+ ex2.click(lambda: "healthy salad", outputs=query_input)
466
+ ex3.click(lambda: "chocolate dessert", outputs=query_input)
467
+
468
+ if __name__ == "__main__":
469
+ demo.launch(
470
+ server_name="0.0.0.0",
471
+ server_port=7860, # Standard port for Hugging Face Spaces
472
+ share=False
473
+ )
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ torch>=1.9.0
2
+ transformers>=4.20.0
3
+ gradio>=4.0.0
4
+ gdown>=4.7.0
5
+ pandas>=1.3.0
6
+ numpy>=1.21.0
7
+ requests>=2.25.0