smartFridge

Runtime error

App Files Files Community

yusenthebot commited on Oct 6, 2025

Commit

81e637f

1 Parent(s): d78eb74

Initial deployment

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gradio/certificate.pem +31 -0
README.md +152 -12
app.py +592 -0
frige_detect/__pycache__/detect.cpython-313.pyc +0 -0
frige_detect/annotated_image.jpg +0 -0
frige_detect/demo/t1.jpg +0 -0
frige_detect/demo/t2.jpg +0 -0
frige_detect/demo/t3.jpg +0 -0
frige_detect/demo/t4.jpg +0 -0
frige_detect/detect.py +208 -0
frige_detect/recipe_input.json +86 -0
frige_detect/roboflow_credentials.txt +4 -0
recipe_recommendation/__init__.py +0 -0
recipe_recommendation/__pycache__/__init__.cpython-313.pyc +0 -0
recipe_recommendation/__pycache__/main.cpython-313.pyc +0 -0
recipe_recommendation/data/ingredient_map.data +0 -0
recipe_recommendation/main.py +652 -0
recipe_recommendation/readme.txt +142 -0
recipe_recommendation/readme_cn.txt +92 -0
recipe_recommendation/src/__init__.py +0 -0
recipe_recommendation/src/__pycache__/__init__.cpython-313.pyc +0 -0
recipe_recommendation/src/__pycache__/candidate.cpython-313.pyc +0 -0
recipe_recommendation/src/__pycache__/coldstart.cpython-313.pyc +0 -0
recipe_recommendation/src/__pycache__/embedding.cpython-313.pyc +0 -0
recipe_recommendation/src/__pycache__/feature.cpython-313.pyc +0 -0
recipe_recommendation/src/__pycache__/highlight.cpython-313.pyc +0 -0
recipe_recommendation/src/__pycache__/io.cpython-313.pyc +0 -0
recipe_recommendation/src/__pycache__/trainmodel.cpython-313.pyc +0 -0
recipe_recommendation/src/candidate.py +365 -0
recipe_recommendation/src/coldstart.py +279 -0
recipe_recommendation/src/embedding.py +100 -0
recipe_recommendation/src/feature.py +176 -0
recipe_recommendation/src/highlight.py +91 -0
recipe_recommendation/src/io.py +37 -0
recipe_recommendation/src/trainmodel.py +237 -0
recipe_recommendation/user_data/demo_user_1/user_profile.json +28 -0
recipe_recommendation/user_data/user_0/feature_order.json +22 -0
recipe_recommendation/user_data/user_0/feedback.csv +2 -0
recipe_recommendation/user_data/user_0/qid.txt +1 -0
recipe_recommendation/user_data/user_0/ranker.pkl +3 -0
recipe_recommendation/user_data/user_0/user_features_rank.csv +0 -0
recipe_recommendation/user_data/user_0/user_profile.json +26 -0
recipe_recommendation/user_data/user_1/feature_order.json +22 -0
recipe_recommendation/user_data/user_1/feedback.csv +3 -0
recipe_recommendation/user_data/user_1/qid.txt +1 -0
recipe_recommendation/user_data/user_1/ranker.pkl +3 -0
recipe_recommendation/user_data/user_1/user_features_rank.csv +0 -0
recipe_recommendation/user_data/user_1/user_profile.json +26 -0
recipe_recommendation/user_data/user_2/feature_order.json +22 -0
recipe_recommendation/user_data/user_2/feedback.csv +2 -0

.gradio/certificate.pem ADDED Viewed

	@@ -0,0 +1,31 @@

+-----BEGIN CERTIFICATE-----
+MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
+TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
+cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
+WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
+ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
+MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
+h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
+0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
+A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
+T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
+B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
+B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
+KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
+OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
+jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
+qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
+rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
+HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
+hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
+ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
+3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
+NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
+ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
+TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
+jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
+oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
+4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
+mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
+emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
+-----END CERTIFICATE-----

README.md CHANGED Viewed

@@ -1,12 +1,152 @@
----
-title: SmartFridge
-emoji: 🐨
-colorFrom: purple
-colorTo: purple
-sdk: gradio
-sdk_version: 5.49.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Smart Fridge Recipe Assistant
+The Smart Fridge Recipe Assistant combines Roboflow-powered ingredient detection with a multi-stage recipe recommendation engine. Upload a photo of your fridge and instantly receive recipe ideas that respect your dietary preferences, nutritional goals, and ingredient availability.
+![Smart Fridge workflow](frige_detect/annotated_image.jpg)
+## Features
+- **Visual ingredient detection** – Uses a Roboflow YOLO model to detect fridge items, annotate the photo, and build a structured ingredient payload.
+- **Robust recipe ranking pipeline** – Performs coarse ranking, ML reranking, and clustering-based diversification using pretrained user profiles.
+- **Personalized dietary controls** – Configure vegetarian style, allergies, preferred cuisines, macro ranges, and cooking time caps directly in the UI.
+- **Interactive feedback loop** – Record positive feedback for recommended recipes to continuously refine personal models.
+- **One-click examples** – Try the demo instantly with bundled sample fridge photos.
+## Project structure
+```
+smartFridge/
+├── app.py                     # Gradio user interface
+├── frige_detect/              # Roboflow detector & demo assets
+│   ├── detect.py
+│   ├── demo/
+│   └── roboflow_credentials.txt
+├── recipe_recommendation/     # Recommendation engine
+│   ├── main.py
+│   ├── src/
+│   └── user_data/
+├── requirements.txt
+└── README.md
+```
+## Installation
+1. Create a new Python environment (recommended).
+2. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+The Roboflow API key and project information used by the detector are stored in `frige_detect/roboflow_credentials.txt` and loaded automatically; no manual input is required.
+```markdown
+## Running the app locally
+```bash
+python app.py
+```
+This command launches a Gradio web interface with share link enabled. In the browser you can:
+### Core Features
+**1. Quick Start with Examples**
+   - Select from predefined user profiles (user_1, user_2, user_3) with different dietary preferences
+   - Choose from example fridge images (t1.jpg, t2.jpg, t3.jpg)
+   - Mix and match any profile with any image for testing
+**2. Custom User Profiles**
+   - Create new user profiles by entering a custom User ID
+   - Configure comprehensive dietary preferences:
+     - **Vegetarian type**: flexible, flexible_vegetarian, ovo_vegetarian, lacto_vegetarian, vegan, non_vegetarian
+     - **Allergies**: comma-separated list (e.g., "peanut, shrimp")
+     - **Region preferences**: comma-separated (e.g., "Asia, Europe")
+     - **Nutritional goals**:
+       - Calorie range (min/max sliders from 0-4000)
+       - Protein range (min/max sliders from 0-250g)
+     - **Ingredient preferences**:
+       - Preferred main ingredients (e.g., "chicken, tofu")
+       - Disliked main ingredients (e.g., "lamb, beef")
+     - **Cooking time limit**: maximum cooking time in minutes (0-180)
+**3. Smart Fridge Detection & Recipe Recommendation**
+   - Upload your own fridge photo or use example images
+   - Click **"Analyze fridge & recommend recipes"**
+   - The system will:
+     - Detect ingredients using the Roboflow computer vision model
+     - Map detected items to parent ingredient categories
+     - Filter recipes based on your dietary restrictions, nutrition goals, and disliked ingredients
+     - Score and rank recipes using ML-based personalization
+     - Apply region preference boosting and ingredient matching
+     - Diversify results using KMeans clustering to ensure variety
+**4. Automatic Profile Management**
+   - User profiles are **automatically saved/updated** every time you click "Analyze"
+   - No manual save required - just modify preferences and run
+   - Feedback count is preserved when updating existing profiles
+   - All profiles stored under `recipe_recommendation/user_data/<user_id>/`
+**5. Feedback System**
+   - Review the top 5 recommended recipes with detailed information:
+     - Recipe name and match score
+     - Region and cuisine type
+     - Nutritional information (calories, protein)
+     - Main, staple, and other ingredients used
+   - Select your favorite recipe from the dropdown
+   - Press **"Save feedback"** to log positive feedback
+   - Feedback is used to retrain personalized ranking models (every 20 feedback entries)
+### How the Recommendation Pipeline Works
+1. **Detection**: Roboflow model identifies ingredients in your fridge photo
+2. **Mapping**: Detected items are mapped to parent categories (e.g., "chicken breast" → "chicken")
+3. **Hard Filtering**:
+   - Removes recipes violating dietary restrictions (vegan/vegetarian)
+   - Filters out recipes outside your calorie/protein ranges
+   - Eliminates recipes containing disliked main ingredients
+4. **Coarse Ranking**: Fast ingredient matching across 20,000+ candidates
+5. **ML Reranking**: Personalized ranking using your trained model (or similar user's model)
+6. **Diversification**: KMeans clustering ensures variety in final recommendations
+7. **Top-K Selection**: Returns the best 5 recipes tailored to your preferences
+All user profiles, feedback files, trained models, and feature rankings are stored under `recipe_recommendation/user_data/<user_id>/`.
+## Dataset & Models
+### Computer Vision Model
+- **Fridge ingredient detection**: [Roboflow Nutrition Object Detection](https://universe.roboflow.com/ie-wqegj/nutrition-object-detection)
+  - Pre-trained model for detecting common food items in refrigerator images
+  - Provides bounding boxes and confidence scores for detected ingredients
+  - Credentials stored in `frige_detect/roboflow_credentials.txt`
+### Recipe Dataset
+- **Recipe database**: Fetched from Hugging Face dataset [`Iris314/recipe-cleaned`](https://huggingface.co/datasets/Iris314/recipe-cleaned)
+- **Ingredient mappings**: Hierarchical mapping from specific items to parent categories
+- Both are automatically downloaded on first run and cached locally
+### Ranking Models
+- User-specific ranking models are automatically:
+  - Bootstrapped using cold-start features for new users
+  - Copied from similar users (based on profile embedding similarity)
+  - Retrained every 20 feedback entries to improve personalization
+- Models stored per user at `recipe_recommendation/user_data/<user_id>/ranker.pkl`
+## Deploying to Hugging Face Spaces
+To deploy this application to Hugging Face Spaces:
+1. Create a new Space on Hugging Face with Gradio SDK
+2. Upload this repository to the Space
+3. Ensure `app.py` is set as the main application file
+4. The Space will automatically run `python app.py` on startup
+5. No additional environment variables or secrets required (Roboflow credentials are bundled)
+The deployed app will have the same functionality as the local version, including persistent user profiles and feedback storage.
+## License
+This project bundles third-party datasets and models subject to their respective licenses:
+- Roboflow Nutrition Object Detection model: Subject to [Roboflow Terms of Service](https://roboflow.com/terms)
+- Recipe dataset from Hugging Face: Check the [`Iris314/recipe-cleaned`](https://huggingface.co/datasets/Iris314/recipe-cleaned) dataset page for licensing details
+```

app.py ADDED Viewed

	@@ -0,0 +1,592 @@

+"""Gradio application for the smart fridge detector + recipe recommendation pipeline."""
+import json
+import tempfile
+from pathlib import Path
+from typing import List, Tuple, Dict, Any
+import cv2
+import gradio as gr
+import numpy as np
+from PIL import Image
+from frige_detect.detect import (
+    detect_and_generate,
+    load_roboflow_credentials,
+    RoboflowCredentials,
+)
+from recipe_recommendation.main import (
+    load_recipes,
+    recommend_recipes,
+    save_user_profile,
+    get_feedback,
+    USER_DATA_DIR,
+)
+# ---------------------------------------------------------------------------
+# Global resources
+# ---------------------------------------------------------------------------
+CREDENTIALS_PATH = Path("frige_detect/roboflow_credentials.txt")
+ROBOFLOW_CREDENTIALS: RoboflowCredentials = load_roboflow_credentials(str(CREDENTIALS_PATH))
+RECIPES_DF = load_recipes()
+# ---------------------------------------------------------------------------
+# Predefined user profiles for examples
+# ---------------------------------------------------------------------------
+EXAMPLE_PROFILES = {
+    "user_1": {
+        "vegetarian_type": "flexible",
+        "allergies": "",
+        "regions": "North America",
+        "calorie_min": 250,
+        "calorie_max": 2000,
+        "protein_min": 50,
+        "protein_max": 160,
+        "preferred_main": "",
+        "disliked_main": "",
+        "cooking_time": 45,
+    },
+    "user_2": {
+        "vegetarian_type": "flexible_vegetarian",
+        "allergies": "shrimp",
+        "regions": "Asia",
+        "calorie_min": 400,
+        "calorie_max": 1500,
+        "protein_min": 40,
+        "protein_max": 120,
+        "preferred_main": "tofu",
+        "disliked_main": "beef",
+        "cooking_time": 60,
+    },
+    "user_3": {
+        "vegetarian_type": "non_vegetarian",
+        "allergies": "",
+        "regions": "Europe",
+        "calorie_min": 500,
+        "calorie_max": 2000,
+        "protein_min": 80,
+        "protein_max": 160,
+        "preferred_main": "beef, chicken",
+        "disliked_main": "",
+        "cooking_time": 45,
+    },
+}
+# Predefined example images
+EXAMPLE_IMAGES = [
+    "frige_detect/demo/t1.jpg",
+    "frige_detect/demo/t2.jpg",
+    "frige_detect/demo/t3.jpg",
+]
+# ---------------------------------------------------------------------------
+# Helper utilities
+# ---------------------------------------------------------------------------
+def parse_csv_list(text: str) -> List[str]:
+    if not text:
+        return []
+    parts = [item.strip() for item in text.split(",") if item.strip()]
+    return parts
+def ensure_numpy_image(image: Any) -> np.ndarray:
+    """Convert incoming image (PIL or numpy) to RGB numpy array."""
+    if image is None:
+        raise ValueError("Please upload a fridge photo before running detection.")
+    if isinstance(image, np.ndarray):
+        return image
+    if isinstance(image, Image.Image):
+        return np.array(image.convert("RGB"))
+    raise ValueError("Unsupported image format provided.")
+def write_temp_image(image: np.ndarray) -> str:
+    """Write numpy image to a temporary file and return the path."""
+    temp_dir = Path(tempfile.mkdtemp(prefix="fridge_upload_"))
+    temp_path = temp_dir / "upload.jpg"
+    bgr_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
+    cv2.imwrite(str(temp_path), bgr_image)
+    return str(temp_path)
+def build_user_profile(
+    user_id: str,
+    vegetarian_type: str,
+    allergies: str,
+    regions: str,
+    calorie_range: Tuple[float, float],
+    protein_range: Tuple[float, float],
+    preferred_main: str,
+    disliked_main: str,
+    cooking_time: float,
+) -> Dict[str, Any]:
+    """
+    Build and save user profile. This function ALWAYS creates or overwrites the profile
+    with the current input values, enabling users to modify preferences on-the-fly.
+    """
+    user_id = user_id.strip()
+    if not user_id:
+        raise ValueError("User ID cannot be empty.")
+    profile_dir = USER_DATA_DIR / user_id
+    profile_path = profile_dir / "user_profile.json"
+    # Preserve feedback count if profile exists
+    num_feedback = 0
+    if profile_path.exists():
+        try:
+            existing = json.loads(profile_path.read_text(encoding="utf-8"))
+            num_feedback = existing.get("num_feedback", 0)
+        except Exception:
+            pass
+    profile = {
+        "user_id": user_id,
+        "num_feedback": num_feedback,
+        "diet": {"vegetarian_type": vegetarian_type},
+        "allergies": parse_csv_list(allergies),
+        "region_preference": parse_csv_list(regions),
+        "nutritional_goals": {
+            "calories": {"min": int(calorie_range[0]), "max": int(calorie_range[1])},
+            "protein": {"min": int(protein_range[0]), "max": int(protein_range[1])},
+        },
+        "other_preferences": {
+            "preferred_main": parse_csv_list(preferred_main),
+            "disliked_main": parse_csv_list(disliked_main),
+            "cooking_time_max": int(cooking_time) if cooking_time else None,
+        },
+    }
+    # Always save the profile (create new or overwrite existing)
+    save_user_profile(user_id, profile)
+    print(f"[app] Profile saved/updated for user '{user_id}'")
+    return profile
+def summarize_ingredients(
+    user_parents: List[str],
+    high_conf: List[str],
+    low_conf: List[str],
+) -> str:
+    lines = ["### Ingredient Mapping"]
+    if user_parents:
+        lines.append("- **Mapped parent ingredients:** " + ", ".join(sorted(user_parents)))
+    else:
+        lines.append("- **Mapped parent ingredients:** none")
+    if high_conf:
+        lines.append("- **High confidence detections:** " + ", ".join(sorted(high_conf)))
+    if low_conf:
+        lines.append("- **Low confidence detections:** " + ", ".join(sorted(set(low_conf))))
+    return "\n".join(lines)
+def _ensure_iterable(value: Any) -> List[str]:
+    if value is None:
+        return []
+    if isinstance(value, set):
+        return sorted(value)
+    if isinstance(value, list):
+        return value
+    if isinstance(value, str):
+        return [value]
+    return list(value)
+def render_recommendations(df) -> Tuple[str, List[Dict[str, Any]]]:
+    if df is None or df.empty:
+        return "No recipes matched the current constraints.", []
+    lines = ["### Recommended Recipes"]
+    feedback_rows: List[Dict[str, Any]] = []
+    for idx, row in df.head(5).iterrows():
+        match_score = row.get("match_score") or row.get("ml_score", 0)
+        scaled = match_score * 100 if match_score is not None else 0
+        name = row.get("name", f"Recipe {idx+1}")
+        lines.append(f"{idx + 1}. **{name}** — score {scaled:.1f}%")
+        region = row.get("region")
+        if region and not (isinstance(region, float) and np.isnan(region)):
+            if isinstance(region, (set, list)):
+                region_str = ", ".join(sorted(region))
+            else:
+                region_str = str(region)
+            lines.append(f"   - Region: {region_str}")
+        cuisine = row.get("cuisine_attr")
+        cuisine_items = _ensure_iterable(cuisine)
+        if cuisine_items:
+            lines.append(f"   - Cuisine: {', '.join(cuisine_items)}")
+        calories = row.get("calories")
+        protein = row.get("protein")
+        if calories is not None:
+            lines.append(f"   - Calories: {calories}")
+        if protein is not None:
+            lines.append(f"   - Protein: {protein}")
+        for key in ["main_parent", "staple_parent", "other_parent"]:
+            parents = _ensure_iterable(row.get(key))
+            if parents:
+                pretty_key = key.replace("_", " ").title()
+                lines.append(f"   - {pretty_key}: {', '.join(parents)}")
+        ingredients = row.get("ingredients")
+        if ingredients:
+            if isinstance(ingredients, str):
+                ingredients_list = parse_csv_list(ingredients)
+            else:
+                ingredients_list = list(ingredients)
+            if ingredients_list:
+                lines.append(f"   - Ingredients: {', '.join(ingredients_list[:10])}")
+        lines.append("")
+        feedback_row = row.to_dict()
+        for key in ["main_parent", "staple_parent", "other_parent", "seasoning_parent", "cuisine_attr", "ingredients"]:
+            value = feedback_row.get(key)
+            if isinstance(value, list):
+                feedback_row[key] = set(value)
+            elif isinstance(value, str):
+                feedback_row[key] = set(parse_csv_list(value))
+        feedback_rows.append(feedback_row)
+    return "\n".join(lines).strip(), feedback_rows
+def load_example_profile(profile_name: str):
+    """Load a predefined user profile configuration."""
+    if profile_name in EXAMPLE_PROFILES:
+        config = EXAMPLE_PROFILES[profile_name]
+        return (
+            profile_name,
+            config["vegetarian_type"],
+            config["allergies"],
+            config["regions"],
+            config["calorie_min"],
+            config["calorie_max"],
+            config["protein_min"],
+            config["protein_max"],
+            config["preferred_main"],
+            config["disliked_main"],
+            config["cooking_time"],
+        )
+    # Default fallback
+    return ("user_custom", "flexible", "", "", 400, 2000, 50, 160, "", "", 45)
+def load_example_image(image_path: str):
+    """Load an example image."""
+    return image_path
+def run_pipeline(
+    image,
+    user_id,
+    vegetarian_type,
+    allergies,
+    regions,
+    calorie_min,
+    calorie_max,
+    protein_min,
+    protein_max,
+    preferred_main,
+    disliked_main,
+    cooking_time,
+):
+    """
+    Main pipeline function.
+    This ALWAYS creates/updates the user profile based on current input values,
+    then runs detection and recommendation.
+    """
+    try:
+        rgb_image = ensure_numpy_image(image)
+        upload_path = write_temp_image(rgb_image)
+        temp_dir = Path(tempfile.mkdtemp(prefix="fridge_outputs_"))
+        output_json = temp_dir / "recipe_input.json"
+        output_image = temp_dir / "annotated_image.jpg"
+        detection_result = detect_and_generate(
+            image_path=upload_path,
+            credentials=ROBOFLOW_CREDENTIALS,
+            conf_threshold=0.4,
+            overlap_threshold=0.3,
+            conf_split=0.7,
+            output_json=str(output_json),
+            output_image=str(output_image),
+        )
+        Path(upload_path).unlink(missing_ok=True)
+        #2: Always create/update user profile with current UI values
+        profile = build_user_profile(
+            user_id,
+            vegetarian_type,
+            allergies,
+            regions,
+            (calorie_min, calorie_max),
+            (protein_min, protein_max),
+            preferred_main,
+            disliked_main,
+            cooking_time,
+        )
+        import time
+        time.sleep(0.2)
+        detection_payload = detection_result["recipe_json"]
+        ml_top, user_parents, high_conf, low_conf = recommend_recipes(
+            detection_payload,
+            user_id,
+            RECIPES_DF,
+            topk=5,
+        )
+        ingredient_summary = summarize_ingredients(user_parents, high_conf, low_conf)
+        recommendation_md, feedback_rows = render_recommendations(ml_top)
+        dropdown_choices = [
+            f"{idx + 1}. {row.get('name', 'Recipe')}" for idx, row in enumerate(feedback_rows)
+        ]
+        status = "" if feedback_rows else "No recipes available for feedback yet."
+        # Add success message about profile creation/update
+        profile_status = f"✓ Profile '{user_id}' has been saved/updated with your current preferences."
+        return (
+            str(output_image),
+            detection_payload,
+            ingredient_summary,
+            recommendation_md,
+            gr.Dropdown(choices=dropdown_choices, value=None),
+            feedback_rows,
+            profile_status,
+        )
+    except Exception as exc:
+        import traceback
+        error_detail = traceback.format_exc()
+        return (
+            None,
+            None,
+            "",
+            f"⚠️ Error: {exc}\n\nDetails:\n{error_detail}",
+            gr.Dropdown(choices=[], value=None),
+            [],
+            f"⚠️ Error: {exc}",
+        )
+def record_feedback(selected_recipe: str, user_id: str, feedback_rows: List[Dict[str, Any]]):
+    if not selected_recipe:
+        return "Please select a recipe before submitting feedback."
+    if not user_id:
+        return "Please provide a valid user ID."
+    if not feedback_rows:
+        return "No recommendation data available. Run the pipeline first."
+    try:
+        index = int(selected_recipe.split(".")[0]) - 1
+    except (ValueError, IndexError):
+        return "Unable to parse the selected recipe."
+    if index < 0 or index >= len(feedback_rows):
+        return "Selected recipe is out of range."
+    recipe_row = feedback_rows[index]
+    get_feedback(user_id, recipe_row)
+    profile_path = USER_DATA_DIR / user_id / "user_profile.json"
+    if profile_path.exists():
+        data = json.loads(profile_path.read_text(encoding="utf-8"))
+        data["num_feedback"] = data.get("num_feedback", 0) + 1
+        save_user_profile(user_id, data)
+    return f"✓ Feedback recorded for {recipe_row.get('name', 'selected recipe')}!"
+# ---------------------------------------------------------------------------
+# Gradio UI definition
+# ---------------------------------------------------------------------------
+with gr.Blocks(title="Smart Fridge Recipe Assistant", theme=gr.themes.Soft()) as demo:
+    gr.Markdown(
+        """
+        # Smart Fridge Recipe Assistant
+        **How to use:**
+        1. (Optional) Select an example profile and/or image from dropdowns
+        2. Modify any preferences in the form - your profile will be saved automatically when you click Analyze
+        3. Upload or select a fridge image
+        4. Click "Analyze fridge & recommend recipes"
+        """
+    )
+    with gr.Row():
+        with gr.Column(scale=1):
+            gr.Markdown("### Quick Start Examples")
+            profile_selector = gr.Dropdown(
+                label="Choose a predefined user profile",
+                choices=list(EXAMPLE_PROFILES.keys()),
+                value=None,
+            )
+            image_selector = gr.Dropdown(
+                label="Choose an example fridge image",
+                choices=[f"Image {i+1}: {img}" for i, img in enumerate(EXAMPLE_IMAGES)],
+                value=None,
+            )
+            image_input = gr.Image(
+                label="Fridge photo (upload or use example)",
+                type="pil",
+                height=350,
+            )
+            detection_json = gr.JSON(label="Detection payload")
+            annotated_output = gr.Image(label="Annotated detection", height=350)
+        with gr.Column(scale=1):
+            gr.Markdown("### User Preferences (auto-saved on each run)")
+            user_id_box = gr.Textbox(
+                label="User ID (will create new profile if doesn't exist)",
+                value="user_custom",
+                placeholder="e.g. my_new_profile",
+            )
+            vegetarian_radio = gr.Radio(
+                [
+                    "flexible",
+                    "flexible_vegetarian",
+                    "ovo_vegetarian",
+                    "lacto_vegetarian",
+                    "vegan",
+                    "non_vegetarian",
+                ],
+                label="Vegetarian preference",
+                value="flexible",
+            )
+            allergies_box = gr.Textbox(
+                label="Allergies (comma separated)",
+                placeholder="peanut, shrimp",
+            )
+            regions_box = gr.Textbox(
+                label="Preferred regions (comma separated)",
+                placeholder="Asia, Europe",
+            )
+            calorie_min = gr.Slider(
+                minimum=0,
+                maximum=4000,
+                value=400,
+                label="Minimum Calories",
+                step=50,
+            )
+            calorie_max = gr.Slider(
+                minimum=0,
+                maximum=4000,
+                value=2000,
+                label="Maximum Calories",
+                step=50,
+            )
+            protein_min = gr.Slider(
+                minimum=0,
+                maximum=250,
+                value=50,
+                label="Minimum Protein (g)",
+                step=5,
+            )
+            protein_max = gr.Slider(
+                minimum=0,
+                maximum=250,
+                value=160,
+                label="Maximum Protein (g)",
+                step=5,
+            )
+            preferred_box = gr.Textbox(
+                label="Preferred main ingredients",
+                placeholder="chicken, tofu",
+            )
+            disliked_box = gr.Textbox(
+                label="Disliked main ingredients",
+                placeholder="lamb",
+            )
+            cooking_slider = gr.Slider(
+                minimum=0,
+                maximum=180,
+                value=45,
+                step=5,
+                label="Max cooking time (minutes)",
+            )
+            run_button = gr.Button("Analyze fridge & recommend recipes", variant="primary")
+            ingredient_md = gr.Markdown()
+            recommendation_md = gr.Markdown()
+            feedback_dropdown = gr.Dropdown(label="Select a recipe for positive feedback", choices=[])
+            feedback_button = gr.Button("Save feedback")
+            feedback_status = gr.Markdown()
+            feedback_state = gr.State([])
+    # Connect profile selector
+    profile_selector.change(
+        fn=load_example_profile,
+        inputs=[profile_selector],
+        outputs=[
+            user_id_box,
+            vegetarian_radio,
+            allergies_box,
+            regions_box,
+            calorie_min,
+            calorie_max,
+            protein_min,
+            protein_max,
+            preferred_box,
+            disliked_box,
+            cooking_slider,
+        ],
+    )
+    # Connect image selector
+    def select_image(choice):
+        if choice:
+            idx = int(choice.split(":")[0].replace("Image ", "")) - 1
+            return EXAMPLE_IMAGES[idx]
+        return None
+    image_selector.change(
+        fn=select_image,
+        inputs=[image_selector],
+        outputs=[image_input],
+    )
+    run_button.click(
+        fn=run_pipeline,
+        inputs=[
+            image_input,
+            user_id_box,
+            vegetarian_radio,
+            allergies_box,
+            regions_box,
+            calorie_min,
+            calorie_max,
+            protein_min,
+            protein_max,
+            preferred_box,
+            disliked_box,
+            cooking_slider,
+        ],
+        outputs=[
+            annotated_output,
+            detection_json,
+            ingredient_md,
+            recommendation_md,
+            feedback_dropdown,
+            feedback_state,
+            feedback_status,
+        ],
+    )
+    feedback_button.click(
+        fn=record_feedback,
+        inputs=[feedback_dropdown, user_id_box, feedback_state],
+        outputs=feedback_status,
+    )
+if __name__ == "__main__":
+    demo.launch(share=True)

frige_detect/__pycache__/detect.cpython-313.pyc ADDED Viewed

Binary file (8.28 kB). View file

frige_detect/annotated_image.jpg ADDED Viewed

frige_detect/demo/t1.jpg ADDED Viewed

frige_detect/demo/t2.jpg ADDED Viewed

frige_detect/demo/t3.jpg ADDED Viewed

frige_detect/demo/t4.jpg ADDED Viewed

frige_detect/detect.py ADDED Viewed

	@@ -0,0 +1,208 @@

+# -*- coding: utf-8 -*-
+"""
+Detect ingredients using a Roboflow model with preprocessing:
+- Resize images to 640x640 if needed.
+- Perform detection.
+- Classify object sizes via K-Means.
+- Generate JSON and annotated image outputs.
+"""
+import json
+import os
+import tempfile
+from dataclasses import dataclass
+import cv2
+import numpy as np
+from roboflow import Roboflow
+from sklearn.cluster import KMeans
+import supervision as sv
+@dataclass
+class RoboflowCredentials:
+    api_key: str
+    project_name: str
+    version: int = 1
+def load_roboflow_credentials(path: str) -> RoboflowCredentials:
+    """Load Roboflow API credentials from a simple key=value text file."""
+    if not os.path.exists(path):
+        raise FileNotFoundError(
+            f"Roboflow credential file not found: {path}."
+        )
+    api_key = None
+    project_name = None
+    version = 1
+    with open(path, "r", encoding="utf-8") as f:
+        for line in f:
+            line = line.strip()
+            if not line or line.startswith("#"):
+                continue
+            if "=" not in line:
+                continue
+            key, value = line.split("=", 1)
+            key = key.strip().lower()
+            value = value.strip()
+            if key == "api_key":
+                api_key = value
+            elif key == "project_name":
+                project_name = value
+            elif key == "version":
+                try:
+                    version = int(value)
+                except ValueError:
+                    raise ValueError("Version in credential file must be an integer") from None
+    if not api_key or not project_name:
+        raise ValueError(
+            "Credential file must contain api_key and project_name entries."
+        )
+    return RoboflowCredentials(api_key=api_key, project_name=project_name, version=version)
+def compute_area_ratios(predictions, img_shape):
+    """Compute area ratio (bbox area / image area) for each detection."""
+    img_area = float(img_shape[0] * img_shape[1])
+    ratios = []
+    for pred in predictions:
+        area = pred["width"] * pred["height"]
+        ratios.append(area / img_area)
+    return np.array(ratios).reshape(-1, 1)
+def cluster_sizes(area_ratios):
+    """Cluster area ratios into two groups using K-Means and return size labels."""
+    kmeans = KMeans(n_clusters=2, init="k-means++", random_state=0)
+    labels = kmeans.fit_predict(area_ratios)
+    centroids = kmeans.cluster_centers_.flatten()
+    large_cluster = np.argmax(centroids)
+    return ["large" if lbl == large_cluster else "small" for lbl in labels]
+def detect_and_generate(
+    image_path: str,
+    credentials: RoboflowCredentials,
+    conf_threshold: float = 0.4,
+    overlap_threshold: float = 0.3,
+    conf_split: float = 0.7,
+    output_json: str = "recipe_input.json",
+    output_image: str = "annotated_image.jpg"
+):
+    """
+    Resize image if necessary, run detection, classify sizes via K-Means, and
+    create both JSON output and annotated image.
+    Args:
+        image_path (str): Path to the original image.
+        api_key (str): Roboflow API key.
+        project_name (str): Roboflow project name.
+        version (int): Model version.
+        conf_threshold (float): Minimum confidence threshold (0–1).
+        overlap_threshold (float): NMS overlap threshold (0–1).
+        conf_split (float): Threshold for high/low confidence lists.
+        output_json (str): Output JSON filename.
+        output_image (str): Output annotated image filename.
+    Returns:
+        dict: Recipe input JSON structure.
+    """
+    # Load original image
+    original_img = cv2.imread(image_path)
+    if original_img is None:
+        raise FileNotFoundError(f"Image not found: {image_path}")
+    height, width = original_img.shape[:2]
+    # Preprocess: resize to 640x640 if needed, and save to a temp file
+    if height != 640 or width != 640:
+        resized_img = cv2.resize(original_img, (640, 640))
+        # create temporary file via mkstemp; close fd to avoid locking
+        fd, tmp_path = tempfile.mkstemp(suffix=".jpg")
+        os.close(fd)
+        cv2.imwrite(tmp_path, resized_img)
+        detection_path = tmp_path
+        img_for_annotation = resized_img
+    else:
+        detection_path = image_path
+        img_for_annotation = original_img
+    # Initialize Roboflow model
+    rf = Roboflow(api_key=credentials.api_key)
+    model = rf.workspace().project(credentials.project_name).version(credentials.version).model
+    # Run prediction
+    response = model.predict(
+        detection_path,
+        confidence=int(conf_threshold * 100),
+        overlap=int(overlap_threshold * 100)
+    ).json()
+    predictions = response["predictions"]
+    # Classify sizes using K-Means
+    area_ratios = compute_area_ratios(predictions, img_for_annotation.shape)
+    size_labels = cluster_sizes(area_ratios)
+    # Build JSON structure
+    ingredients = []
+    high_conf = []
+    low_conf = []
+    for pred, size_label in zip(predictions, size_labels):
+        name = pred["class"]
+        conf = pred["confidence"]
+        ingredients.append({
+            "name": name,
+            "quantity": size_label,
+            "confidence": round(conf, 2)
+        })
+        if conf >= conf_split:
+            high_conf.append(name)
+        else:
+            low_conf.append(name)
+    recipe_json = {
+        "ingredients": ingredients,
+        "high_confidence_ingredients": high_conf,
+        "low_confidence_ingredients": low_conf
+    }
+    # Write JSON to file
+    with open(output_json, "w", encoding="utf-8") as jf:
+        json.dump(recipe_json, jf, indent=4)
+    # Annotate image with bounding boxes and confidence labels
+    detections = sv.Detections.from_inference(response)
+    label_annotator = sv.LabelAnnotator()
+    box_annotator = sv.BoxAnnotator()
+    labels_for_annotation = [
+        f"{pred['class']} ({pred['confidence']:.2f})" for pred in predictions
+    ]
+    annotated_img = box_annotator.annotate(
+        scene=img_for_annotation.copy(),
+        detections=detections
+    )
+    annotated_img = label_annotator.annotate(
+        scene=annotated_img,
+        detections=detections,
+        labels=labels_for_annotation
+    )
+    cv2.imwrite(output_image, annotated_img)
+    # Display annotated image (optional, for notebooks)
+    # Clean up temporary file
+    if height != 640 or width != 640:
+        try:
+            os.remove(tmp_path)
+        except PermissionError:
+            # If still locked on Windows, delay deletion or log a warning
+            pass
+    return {
+        "recipe_json": recipe_json,
+        "output_json_path": output_json,
+        "annotated_image_path": output_image,
+    }

frige_detect/recipe_input.json ADDED Viewed

	@@ -0,0 +1,86 @@

+{
+    "ingredients": [
+        {
+            "name": "sugar",
+            "quantity": "large",
+            "confidence": 0.91
+        },
+        {
+            "name": "chicken",
+            "quantity": "large",
+            "confidence": 0.91
+        },
+        {
+            "name": "milk",
+            "quantity": "large",
+            "confidence": 0.89
+        },
+        {
+            "name": "flour",
+            "quantity": "large",
+            "confidence": 0.88
+        },
+        {
+            "name": "eggs",
+            "quantity": "small",
+            "confidence": 0.88
+        },
+        {
+            "name": "apple",
+            "quantity": "large",
+            "confidence": 0.86
+        },
+        {
+            "name": "corn",
+            "quantity": "small",
+            "confidence": 0.85
+        },
+        {
+            "name": "blueberries",
+            "quantity": "small",
+            "confidence": 0.83
+        },
+        {
+            "name": "chicken_breast",
+            "quantity": "large",
+            "confidence": 0.82
+        },
+        {
+            "name": "ground_beef",
+            "quantity": "large",
+            "confidence": 0.81
+        },
+        {
+            "name": "beef",
+            "quantity": "large",
+            "confidence": 0.77
+        },
+        {
+            "name": "carrot",
+            "quantity": "large",
+            "confidence": 0.75
+        },
+        {
+            "name": "bread",
+            "quantity": "large",
+            "confidence": 0.51
+        }
+    ],
+    "high_confidence_ingredients": [
+        "sugar",
+        "chicken",
+        "milk",
+        "flour",
+        "eggs",
+        "apple",
+        "corn",
+        "blueberries",
+        "chicken_breast",
+        "ground_beef",
+        "beef",
+        "carrot"
+    ],
+    "low_confidence_ingredients": [
+        "bread"
+    ]
+}

frige_detect/roboflow_credentials.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+# Roboflow credentials used by the app and detector
+api_key=t2nRJrn7ppJIC8RGHdwk
+project_name=nutrition-object-detection
+version=1

recipe_recommendation/__init__.py ADDED Viewed

File without changes

recipe_recommendation/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (191 Bytes). View file

recipe_recommendation/__pycache__/main.cpython-313.pyc ADDED Viewed

Binary file (26.8 kB). View file

recipe_recommendation/data/ingredient_map.data ADDED Viewed

The diff for this file is too large to render. See raw diff

recipe_recommendation/main.py ADDED Viewed

	@@ -0,0 +1,652 @@

+# main.py
+# -*- coding: utf-8 -*-
+"""
+Entry point for the new pipeline:
+1) I/O init & parsing
+2) Load user parents from recipe_input.json via ingredient_map (children -> parent)
+3) Ensure cold-start features & trained ranker exist
+4) Step 2: Coarse ranking
+5) Step 3: ML reranking
+6) Pretty print top results
+"""
+import os
+import json
+import ast
+import pandas as pd
+from pathlib import Path
+import shutil
+from recipe_recommendation.src.io import load_recipes_csv, load_ingredient_map, download_file
+from recipe_recommendation.src.coldstart import cold_start_ranker
+from recipe_recommendation.src.trainmodel import train_model_ranker
+from recipe_recommendation.src.candidate import (
+    coarse_rank_candidates,
+    ml_generate_candidates,
+    hard_filter,
+)
+from recipe_recommendation.src.highlight import (
+    print_candidates,
+    diversify_topk_with_min_clusters,
+)
+from recipe_recommendation.src.feature import build_features, build_cluster_features
+from recipe_recommendation.src.embedding import find_most_similar_user
+BASE_DIR = Path(__file__).resolve().parent
+USER_DATA_DIR = BASE_DIR / "user_data"
+def load_recipes() -> pd.DataFrame:
+    """
+    Load recipes.csv as DataFrame and assign a unique recipe_id to each row.
+    This keeps io.py focused on downloading only.
+    """
+    path = download_file("recipes.csv")
+    df = pd.read_csv(path)
+    df.reset_index(drop=True, inplace=True)
+    df["recipe_id"] = df.index
+    return df
+# ---------------------------
+# Helpers: parsing utilities
+# ---------------------------
+def parse_list(x):
+    """Parse a cell into Python list; tolerant to str/NaN/set."""
+    if isinstance(x, list):
+        return x
+    if x is None or (isinstance(x, float) and pd.isna(x)):
+        return []
+    if isinstance(x, set):
+        return list(x)
+    s = str(x).strip()
+    if not s:
+        return []
+    # Try literal eval first
+    try:
+        v = ast.literal_eval(s)
+        if isinstance(v, list):
+            return v
+        if isinstance(v, set):
+            return list(v)
+    except Exception:
+        pass
+    # Fallback: comma-separated
+    s = s.strip("[]")
+    parts = [t.strip() for t in s.split(",") if t.strip()]
+    return parts
+def parse_set(x):
+    """Parse a cell into Python set via parse_list."""
+    return set(parse_list(x))
+# -------------------------------------
+# Map user CV result -> parent set
+# -------------------------------------
+def load_user_parents_from_json(json_path, ingredient_map, conf_th=0.8):
+    """
+    Map raw ingredient names to parent categories using ingredient_map["children"].
+    If a name is already a parent in ingredient_map["parents"], keep it.
+    Unknown terms are skipped.
+    """
+    parents_map = ingredient_map.get("parents", {}) or {}
+    children_map = ingredient_map.get("children", {}) or {}
+    if not os.path.exists(json_path):
+        raise FileNotFoundError(f"recipe_input.json not found at: {json_path}")
+    with open(json_path, "r", encoding="utf-8") as f:
+        data = json.load(f)
+    out = []
+    hi, lo = [], []
+    for ing in data.get("ingredients", []):
+        name = (ing.get("name") or "").strip().lower().replace("_", " ")
+        conf = float(ing.get("confidence", 0.0))
+        parent = None
+        if name in children_map:
+            # Prefer "parent" field; fall back to "fallback" if present
+            parent = children_map[name].get("parent") or children_map[name].get("fallback")
+        elif name in parents_map:
+            parent = name
+        if parent and conf >= conf_th:
+            out.append(parent)
+            hi.append((name, parent))
+        else:
+            lo.append(name)
+    if hi:
+        print("High-confidence ingredients mapped to parents:")
+        for child, p in hi:
+            print(f"  - {child} → {p}")
+    if lo:
+        print(f"Ignored (low confidence or no parent found): {sorted(set(lo))}")
+    return sorted(set(out))
+def normalize_user_profile(profile):
+    """Fill missing keys and set defaults to avoid None errors downstream."""
+    # Diet
+    diet = profile.get("diet", {})
+    profile["diet"] = {"vegetarian_type": diet.get("vegetarian_type", "flexible")}
+    # Allergies
+    if "allergies" not in profile or profile["allergies"] is None:
+        profile["allergies"] = []
+    # Region
+    if "region_preference" not in profile or profile["region_preference"] is None:
+        profile["region_preference"] = []
+    # Nutritional goals
+    if "nutritional_goals" not in profile or profile["nutritional_goals"] is None:
+        profile["nutritional_goals"] = {
+            "calories": {"min": 0, "max": 9999},
+            "protein": {"min": 0, "max": 999}
+        }
+    else:
+        ng = profile["nutritional_goals"]
+        ng["calories"] = ng.get("calories", {"min": 0, "max": 9999})
+        ng["protein"] = ng.get("protein", {"min": 0, "max": 999})
+    # Other preferences
+    other = profile.get("other_preferences", {})
+    if not other:
+        other = {}
+    other["preferred_main"] = other.get("preferred_main", [])
+    other["disliked_main"] = other.get("disliked_main", [])
+    other["cooking_time_max"] = other.get("cooking_time_max", None)
+    profile["other_preferences"] = other
+    return profile
+def is_profile_empty(profile):
+    """Return True if the profile has almost no meaningful preferences."""
+    if profile.get("diet", {}).get("vegetarian_type") not in [None, "", "flexible"]:
+        return False
+    if profile.get("allergies"):
+        return False
+    if profile.get("region_preference"):
+        return False
+    ng = profile.get("nutritional_goals", {})
+    if ng.get("calories") or ng.get("protein"):
+        c = ng.get("calories", {})
+        p = ng.get("protein", {})
+        if c.get("min", 0) > 0 or c.get("max", 0) < 9999:
+            return False
+        if p.get("min", 0) > 0 or p.get("max", 0) < 999:
+            return False
+    other = profile.get("other_preferences", {})
+    if other.get("preferred_main") or other.get("disliked_main") or other.get("cooking_time_max"):
+        return False
+    return True
+def fill_default_preferences(profile):
+    """
+    Fill some lightweight, neutral defaults so that hard_filter and cold_start
+    can work efficiently even for new users with no explicit preferences.
+    """
+    profile["diet"]["vegetarian_type"] = "flexible"
+    profile["region_preference"] = ["North America", "Europe"]
+    profile["nutritional_goals"]["protein"] = {"min": 50, "max": 150}
+    profile["nutritional_goals"]["calories"] = {"min": 400, "max": 2000}
+    profile["other_preferences"]["cooking_time_max"] = 45
+    return profile
+def ensure_user_profile(user_id):
+    """
+    Load user profile JSON, normalize structure, and fill default preferences
+    if the profile is empty. This ensures downstream code never breaks on None
+    and avoids extremely slow cold start for users with no preferences.
+    """
+    import os, json
+    profile_file = USER_DATA_DIR / user_id / "user_profile.json"
+    if not os.path.exists(profile_file):
+        raise FileNotFoundError(
+            f"Missing profile: {profile_file}. Please create one first."
+        )
+    # Load profile
+    with open(profile_file, "r", encoding="utf-8") as f:
+        profile = json.load(f)
+    # Normalize structure
+    profile = normalize_user_profile(profile)
+    # Detect if almost empty
+    if is_profile_empty(profile):
+        print(f"[profile] User {user_id} has an empty or near-empty profile. Filling defaults...")
+        profile = fill_default_preferences(profile)
+    return profile
+def save_user_profile(user_id, profile):
+    profile_path = USER_DATA_DIR / user_id / "user_profile.json"
+    profile_path.parent.mkdir(parents=True, exist_ok=True)
+    with open(profile_path, "w", encoding="utf-8") as f:
+        json.dump(profile, f, indent=2)
+def collect_user_feedback(user_id: str, selected_recipe_row: dict, user_profile: dict, qid: int):
+    """
+    Collect a single feedback sample.
+    - Uses build_features() to ensure feature alignment with training
+    - Maintains a fixed feature order via feature_order.json
+    """
+    user_dir = USER_DATA_DIR / user_id
+    user_dir.mkdir(parents=True, exist_ok=True)
+    feedback_path = user_dir / "feedback.csv"
+    feature_order_path = user_dir / "feature_order.json"
+    recipe_dict = {
+        "main": selected_recipe_row.get("main_parent", set()),
+        "staple": selected_recipe_row.get("staple_parent", set()),
+        "other": selected_recipe_row.get("other_parent", set()),
+        "seasoning": selected_recipe_row.get("seasoning_parent", set()),
+        "matched_main": len(selected_recipe_row.get("main_parent", set()) & set(user_profile.get("user_parents", []))),
+        "matched_staple": len(selected_recipe_row.get("staple_parent", set()) & set(user_profile.get("user_parents", []))),
+        "matched_other": len(selected_recipe_row.get("other_parent", set()) & set(user_profile.get("user_parents", []))),
+        "calories": selected_recipe_row.get("calories", 0),
+        "protein": selected_recipe_row.get("protein", 0),
+        "fat": selected_recipe_row.get("fat", 0),
+        "region": selected_recipe_row.get("region", ""),
+        "cuisine_attr": selected_recipe_row.get("cuisine_attr", []),
+        "ingredients": selected_recipe_row.get("ingredients", []),
+        "minutes": selected_recipe_row.get("minutes", None),
+    }
+    features = build_features(recipe_dict, user_profile)
+    if os.path.exists(feature_order_path):
+        with open(feature_order_path, "r", encoding="utf-8") as f:
+            feature_order = json.load(f)
+    else:
+        feature_order = list(features.keys())
+        with open(feature_order_path, "w", encoding="utf-8") as f:
+            json.dump(feature_order, f, indent=2)
+    for feat in features.keys():
+        if feat not in feature_order:
+            feature_order.append(feat)
+            with open(feature_order_path, "w", encoding="utf-8") as f:
+                json.dump(feature_order, f, indent=2)
+    row_data = {feat: features.get(feat, 0) for feat in feature_order}
+    row_data["recipe_id"] = selected_recipe_row["recipe_id"]
+    row_data["qid"] = qid
+    row_data["relevance"] = 5
+    new_row_df = pd.DataFrame([row_data])
+    if os.path.exists(feedback_path):
+        old_df = pd.read_csv(feedback_path)
+        for col in new_row_df.columns:
+            if col not in old_df.columns:
+                old_df[col] = 0
+        for col in old_df.columns:
+            if col not in new_row_df.columns:
+                new_row_df[col] = 0
+        df = pd.concat([old_df, new_row_df], ignore_index=True)
+    else:
+        df = new_row_df
+    df.to_csv(feedback_path, index=False)
+    print(f"[feedback] Saved user feedback to {feedback_path} ({len(df)} rows total)")
+# def ensure_model(user_id):
+#     base_dir = USER_DATA_DIR / user_id
+#     base_dir.mkdir(parents=True, exist_ok=True)
+#     features_rank = base_dir / "user_features_rank.csv"
+#     model_file = base_dir / "ranker.pkl"
+#     if not os.path.exists(features_rank):
+#         print("[main] No cold-start features found; running cold_start_ranker() ...")
+#         cold_start_ranker(user_id=user_id)
+#     if not os.path.exists(model_file):
+#         print("[main] No model found; training ranker with train_model_ranker() ...")
+#         train_model_ranker(user_id=user_id)
+#     return model_file
+def ensure_model(user_id):
+    base_dir = USER_DATA_DIR / user_id
+    base_dir.mkdir(parents=True, exist_ok=True)
+    features_rank = base_dir / "user_features_rank.csv"
+    model_file = base_dir / "ranker.pkl"
+    if not os.path.exists(features_rank):
+        print("[main] No cold-start features found; running cold_start_ranker() ...")
+        # pass user_data_dir
+        cold_start_ranker(user_id=user_id, user_data_dir=str(USER_DATA_DIR))
+    if not os.path.exists(model_file):
+        print("[main] No model found; training ranker with train_model_ranker() ...")
+        train_model_ranker(user_id=user_id)
+    return model_file
+def prepare_recipes_df(df: pd.DataFrame) -> pd.DataFrame:
+    """
+    Normalize key columns to list/set shapes that our candidate/feature modules expect.
+    """
+    df = df.copy()
+    # list-like columns
+    for col in ["staple", "main", "seasoning", "other", "ingredients"]:
+        if col in df.columns:
+            df[col] = df[col].apply(parse_list)
+    # set-like columns
+    for col in ["staple_parent", "main_parent", "seasoning_parent", "other_parent", "cuisine_attr"]:
+        if col in df.columns:
+            df[col] = df[col].apply(parse_set)
+    # region: allow str or set; if it looks like list/set, cast to set; otherwise keep str
+    if "region" in df.columns:
+        def _region_norm(x):
+            if isinstance(x, (set, list)):
+                return set(x)
+            try:
+                v = ast.literal_eval(str(x))
+                if isinstance(v, (set, list)):
+                    return set(v)
+            except Exception:
+                pass
+            return str(x) if pd.notna(x) else ""
+        df["region"] = df["region"].apply(_region_norm)
+    return df
+def maybe_retrain_model(user_id):
+    profile_path = USER_DATA_DIR / user_id / "user_profile.json"
+    if not profile_path.exists():
+        return
+    profile = json.loads(profile_path.read_text())
+    n_fb = profile.get("num_feedback", 0)
+    if n_fb > 0 and n_fb % 20 == 0:
+        print(f"[main] {n_fb} feedback reached, retraining ranker...")
+        model_path = USER_DATA_DIR / user_id / "ranker.pkl"
+        if model_path.exists():
+            model_path.unlink()
+        train_model_ranker(user_id)
+def get_next_qid(user_id: str) -> int:
+    user_dir = USER_DATA_DIR / user_id
+    user_dir.mkdir(parents=True, exist_ok=True)
+    qid_path = user_dir / "qid.txt"
+    if qid_path.exists():
+        qid = int(qid_path.read_text()) + 1
+    else:
+        qid = 0
+    qid_path.write_text(str(qid))
+    return qid
+def maybe_reuse_model(user_id, threshold=0.85):
+    match_uid, sim = find_most_similar_user(user_id, threshold=threshold)
+    if match_uid:
+        print(f"[model reuse] Reusing {match_uid}'s model for {user_id} (sim={sim:.3f})")
+        return match_uid
+    return None
+def main(user_id="user_1",
+         recipe_input_json=None,
+         topk=5,
+         topn_coarse=20000):
+    # 1) I/O init
+    maybe_retrain_model(user_id)
+    recipes_df = load_recipes()
+    ingredient_map = load_ingredient_map()
+    # 2) Load user_parents from recipe_input.json (fall back to /data if needed)
+    if recipe_input_json is None:
+        # prefer project root; then /data
+        default_candidates = [
+            os.path.join("data", "recipe_input.json"),
+            "recipe_input.json",
+            "/data/recipe_input.json",
+        ]
+        recipe_input_json = next((p for p in default_candidates if os.path.exists(p)), default_candidates[-1])
+    user_parents = load_user_parents_from_json(recipe_input_json, ingredient_map, conf_th=0.8)
+    # 3) Load user profile
+    user_profile = ensure_user_profile(user_id)
+    # Embedding similarity fallback
+    match_uid, sim = find_most_similar_user(user_id, threshold=0.85)
+    if match_uid is not None:
+        print(f"[main] Using model of similar user '{match_uid}' for '{user_id}' (sim={sim:.3f})")
+        src_dir = USER_DATA_DIR / match_uid
+        dst_dir = USER_DATA_DIR / user_id
+        dst_dir.mkdir(parents=True, exist_ok=True)
+        for fname in ["ranker.pkl", "user_features_rank.csv"]:
+            src = src_dir / fname
+            dst = dst_dir / fname
+            if os.path.exists(src) and not os.path.exists(dst):
+                shutil.copyfile(src, dst)
+                print(f"[embedding] Copied {fname} from {match_uid} to {user_id}")
+    # 4) Ensure cold-start features & model
+    model_path = ensure_model(user_id)
+    # 5) Prepare recipes & coarse rank (Step 2)
+    df = prepare_recipes_df(recipes_df)
+    recipes_records = df.to_dict(orient="records")
+    filtered_records = [r for r in recipes_records if hard_filter(r, user_profile)]
+    if not filtered_records:
+        print("[main] No recipes after hard dietary filtering.")
+        return
+    coarse = coarse_rank_candidates(
+        recipes=recipes_records,
+        user_parents=user_parents,
+        user_profile=user_profile,
+        top_n=topn_coarse
+    )
+    if not coarse:
+        print("[main] No coarse candidates. Please check user_parents or dataset.")
+        return
+    # 6) ML reranking (Step 3)
+    ml_top = ml_generate_candidates(
+        coarse_candidates=coarse,
+        user_parents=user_parents,
+        user_profile=user_profile,
+        model_path=model_path,
+        topk=200
+    )
+    if ml_top is None or len(ml_top) == 0:
+        print("[main] No ML candidates returned.")
+        return
+     # 6.5) KMeans Diversification
+    candidates_list = ml_top.to_dict(orient="records")
+    X_cluster = build_cluster_features(candidates_list)
+    diversified = diversify_topk_with_min_clusters(
+        ranked_candidates=candidates_list,
+        feature_matrix=X_cluster,
+        top_k=topk,
+        n_clusters=10,
+        min_clusters=3
+    )
+    ml_top = pd.DataFrame(diversified)
+    # 7) Pretty print (reuse print_candidates expecting 'match_score')
+    ml_top = ml_top.copy()
+    if "match_score" not in ml_top.columns and "ml_score" in ml_top.columns:
+        ml_top["match_score"] = ml_top["ml_score"]
+    print(f"\nFound {len(ml_top)} candidate recipes:\n")
+    print_candidates(ml_top, user_parents, topk=topk)
+    # 8) Give feedbacks
+    qid = get_next_qid(user_id)
+    selected_idx = int(input(f"Select a recipe from 1-{topk}: ")) - 1
+    selected_row = ml_top.iloc[selected_idx].to_dict()
+    collect_user_feedback(user_id, selected_row, user_profile, qid)
+def recommend_recipes(detection_payload, user_id, recipes_df, topk=5):
+    """
+    Unified recommendation entry for the app.
+    Handles user profile loading, ingredient mapping, and embedding fallback internally.
+    """
+    # 0) Check if retraining is needed (new feedback, updated features)
+    maybe_retrain_model(user_id)
+    # 1) Ingredient mapping - use existing high/low confidence fields
+    ingredient_map = load_ingredient_map()
+    ingredients = detection_payload.get("ingredients", [])
+    high_conf = detection_payload.get("high_confidence_ingredients", [])
+    low_conf = detection_payload.get("low_confidence_ingredients", [])
+    # user_parents = []
+    # for item in ingredients:
+    #     name = item.get("name")
+    #     if not name:
+    #         continue
+    #     parent = ingredient_map.get(name.lower())
+    #     if parent:
+    #         user_parents.append(parent)
+    # user_parents = sorted(set(user_parents))
+    parents_map = ingredient_map.get("parents", {}) or {}
+    children_map = ingredient_map.get("children", {}) or {}
+    user_parents = []
+    for item in ingredients:
+        name = (item.get("name") or "").strip().lower().replace("_", " ")
+        if not name:
+            continue
+        parent = None
+        if name in children_map:
+            parent = children_map[name].get("parent") or children_map[name].get("fallback")
+        elif name in parents_map:
+            parent = name
+        if parent:
+            user_parents.append(parent)
+    user_parents = sorted(set(user_parents))
+    high_conf = sorted(set(high_conf))
+    low_conf = sorted(set(low_conf))
+    # 2) Load user profile internally
+    user_profile = ensure_user_profile(user_id)
+    # 3) Embedding fallback
+    match_uid, sim = find_most_similar_user(user_id, threshold=0.85)
+    if match_uid is not None:
+        print(f"[embedding] Using model of similar user '{match_uid}' for '{user_id}' (sim={sim:.3f})")
+        src_dir = USER_DATA_DIR / match_uid
+        dst_dir = USER_DATA_DIR / user_id
+        dst_dir.mkdir(parents=True, exist_ok=True)
+        for fname in ["ranker.pkl", "user_features_rank.csv"]:
+            src = src_dir / fname
+            dst = dst_dir / fname
+            if os.path.exists(src) and not os.path.exists(dst):
+                shutil.copyfile(src, dst)
+                print(f"[embedding] Copied {fname} from {match_uid} to {user_id}")
+    # 4) Coldstart / model ensure
+    model_path = ensure_model(user_id)
+    # 5) Coarse rank
+    df = prepare_recipes_df(recipes_df)
+    recipes_records = df.to_dict(orient="records")
+    filtered_records = [r for r in recipes_records if hard_filter(r, user_profile)]
+    if not filtered_records:
+        return pd.DataFrame(), user_parents, high_conf, low_conf
+    coarse = coarse_rank_candidates(
+        recipes=recipes_records,
+        user_parents=user_parents,
+        user_profile=user_profile,
+        top_n=20000
+    )
+    if not coarse:
+        return pd.DataFrame(), user_parents, high_conf, low_conf
+    # 6) ML rerank
+    ml_top = ml_generate_candidates(
+        coarse_candidates=coarse,
+        user_parents=user_parents,
+        user_profile=user_profile,
+        model_path=model_path,
+        topk=200
+    )
+    if ml_top is None or len(ml_top) == 0:
+        return pd.DataFrame(), user_parents, high_conf, low_conf
+    # 7) KMeans diversification
+    candidates_list = ml_top.to_dict(orient="records")
+    X_cluster = build_cluster_features(candidates_list)
+    diversified = diversify_topk_with_min_clusters(
+        ranked_candidates=candidates_list,
+        feature_matrix=X_cluster,
+        top_k=topk,
+        n_clusters=10,
+        min_clusters=3
+    )
+    ml_top = pd.DataFrame(diversified)
+    return ml_top, user_parents, high_conf, low_conf
+def get_feedback(user_id: str, recipe_row: dict, qid: int = None):
+    """
+    App-friendly feedback collection function.
+    Parameters
+    ----------
+    user_id : str
+        The ID of the user submitting feedback.
+    recipe_row : dict
+        The recipe information dict (e.g., one row from ml_top.to_dict()).
+    qid : int, optional
+        The query ID for ranking context. If not provided, defaults to 0 or auto increments.
+    """
+    # 1) Ensure user profile is loaded internally
+    user_profile = ensure_user_profile(user_id)
+    # 2) If qid is not provided, generate automatically
+    if qid is None:
+        try:
+            qid = get_next_qid(user_id)
+        except Exception:
+            qid = 0
+    # 3) Delegate to existing collect_user_feedback
+    collect_user_feedback(user_id, recipe_row, user_profile, qid)
+    print(f"[app] Feedback collected for user '{user_id}', qid={qid}, recipe_id={recipe_row.get('id')}")
+if __name__ == "__main__":
+    main("user_3")

recipe_recommendation/readme.txt ADDED Viewed

	@@ -0,0 +1,142 @@

+readme_text = """\
+===========================
+Recipe Recommendation System
+===========================
+This project implements a complete recipe recommendation system, including cold start ranking, ML-based reranking, KMeans-based diversification, and user feedback collection.
+All functions are fully encapsulated and can be easily called from external applications.
+-------------------------------------
+1. Main Entry Functions for External Use
+-------------------------------------
+The three main functions for external usage are:
+1) recommend_recipes(detection_payload, user_id, recipes_df, topk=5)
+   - Input:
+     • detection_payload: dict or JSON object containing detected ingredients.
+     • user_id: str, unique user identifier.
+     • recipes_df: pandas.DataFrame loaded by `load_recipes()`.
+     • topk: int, number of final recipes to return (default = 5).
+   - Output:
+     • ml_top: pandas.DataFrame of top recommended recipes (with ml_score & metadata).
+     • user_parents: list of mapped parent ingredients.
+     • high_conf: list of high-confidence ingredient matches.
+     • low_conf: list of low-confidence or unmapped ingredients.
+   Internally, this function performs:
+   - Ingredient mapping from detection payload
+   - Embedding fallback (copy model/features from similar user)
+   - Cold start feature generation if needed
+   - Coarse ranking → ML reranking → KMeans diversification
+   - Returns the final diversified top-k recommendations.
+2) load_recipes()
+   - Input: None
+   - Output: pandas.DataFrame of all recipes (automatically downloaded from Hugging Face if not present).
+   - This function loads the full recipe dataset into memory.
+     If the dataset is not found locally, it will automatically download and cache it under `data/`.
+3) get_feedback(user_id, recipe_row, qid=None)
+   - Input:
+     • user_id: str, unique user identifier.
+     • recipe_row: dict, a single recipe row (e.g. one of the top-k recommendations).
+     • qid: int, optional query ID. Defaults to auto-generated or 0.
+   - Output: None
+   - Function:
+     • Loads user profile internally
+     • Appends the feedback (recipe metadata, user choice) into `user_data/{user_id}/feedback.csv`
+     • Does not retrain the model automatically (use `maybe_retrain_model` if needed)
+----------------------------------------
+2. User Profiles and Pretrained Models
+----------------------------------------
+The `user_data` folder contains four example users:
+- user_0 : Empty profile for testing the system’s ability to bootstrap from zero information.
+- user_1 : A user with specific dietary habits.
+- user_2 : A user with different dietary preferences.
+- user_3 : Similar to user_2, used to test simple embedding-based model reuse.
+For each user:
+- Cold start features and ML models (`user_features_rank.csv` and `ranker.pkl`) have already been generated.
+- You can add new users by creating a new folder under `user_data/` with a profile file `user_profile.json` in the following format:
+{
+  "user_id": "user_001",
+  "num_feedback": 0,
+  "diet": {
+    "vegetarian_type": "flexible_vegetarian"
+  },
+  "allergies": ["peanut", "shrimp"],
+  "region_preference": ["Asia", "Europe"],
+  "nutritional_goals": {
+    "calories": { "min": 400, "max": 3000 },
+    "protein": { "min": 100, "max": 160 }
+  },
+  "other_preferences": {
+    "preferred_main": ["chicken", "tofu"],
+    "disliked_main": ["lamb"],
+    "cooking_time_max": 40
+  }
+}
+The cold start process will typically take **15–25 minutes**, depending on your system performance.
+----------------------------------------
+3. Dataset Download
+----------------------------------------
+Large recipe and ingredient mapping files are stored on Hugging Face under the account:
+  → iris314
+These files will be automatically downloaded the first time `load_recipes()` or related functions are called.
+No manual setup is required.
+----------------------------------------
+4. Feedback Loop & Retraining
+----------------------------------------
+User feedback is saved in `feedback.csv` files under each user's directory.
+To trigger retraining after feedback collection, call:
+from trainmodel import maybe_retrain_model
+maybe_retrain_model(user_id)
+This checks timestamps between `user_features_rank.csv` and `ranker.pkl` to decide if retraining is needed.
+----------------------------------------
+5. Cold Start & Embedding Fallback
+----------------------------------------
+- If a user has no model or features, the system runs a cold start procedure to generate ranking features.
+- If a similar user exists (cosine similarity > 0.85), the system copies their model and features to skip retraining.
+----------------------------------------
+6. Quick Start Example
+----------------------------------------
+from main import recommend_recipes, load_recipes, get_feedback
+# 1. Load dataset
+recipes_df = load_recipes()
+# 2. Prepare a fake detection payload
+payload = {"detected_ingredients": ["chicken", "milk", "flour"]}
+# 3. Recommend
+top_recipes, user_parents, high_conf, low_conf = recommend_recipes(payload, "user_1", recipes_df, topk=5)
+# 4. Feedback
+get_feedback("user_1", top_recipes.iloc[0].to_dict())
+----------------------------------------
+End of README
+----------------------------------------
+"""
+with open("README.txt", "w", encoding="utf-8") as f:
+    f.write(readme_text)
+"README.txt file created successfully."

recipe_recommendation/readme_cn.txt ADDED Viewed

	@@ -0,0 +1,92 @@

+=============================
+菜谱推荐系统（Recipe Recommendation）
+=============================
+本项目实现了一个完整的菜谱推荐系统，包括：
+- 冷启动（Cold Start）排序
+- 机器学习模型（ML）重排序
+- KMeans 聚类多样化
+- 用户反馈收集与自动重训
+所有功能都已封装好，外部调用只需要几个简单的接口。
+----------------------------------------
+1. 外部主要调用函数
+----------------------------------------
+1) recommend_recipes(detection_payload, user_id, recipes_df, topk=5)
+   - 输入：
+     • detection_payload：dict 或 JSON，表示检测到的食材
+     • user_id：str，用户 ID
+     • recipes_df：通过 `load_recipes()` 加载的菜谱 DataFrame
+     • topk：返回的推荐菜谱数量（默认 5）
+   - 输出：
+     • ml_top：推荐结果（DataFrame）
+     • user_parents：映射后的父食材列表
+     • high_conf：高置信度匹配
+     • low_conf：低置信度/未匹配食材
+   功能包括：食材映射 → 相似用户模型复制 → 冷启动 → 粗排 → ML 重排 → KMeans 多样化。
+2) load_recipes()
+   - 自动从 Hugging Face（iris314）下载菜谱数据到 `data/`，并返回 DataFrame。
+3) get_feedback(user_id, recipe_row, qid=None)
+   - 收集用户反馈并写入 `user_data/{user_id}/feedback.csv`
+   - user_profile 自动加载，qid 缺省自动分配
+----------------------------------------
+2. 用户数据
+----------------------------------------
+`user_data` 里包含四个示例用户：
+- user_0：空 profile，用于测试零信息自启
+- user_1 / user_2：有不同饮食偏好的真实用户
+- user_3：与 user_2 类似，用于测试 embedding 复制功能
+每个用户目录下都有 `user_profile.json`、`user_features_rank.csv`、`ranker.pkl`。
+你可以新增用户，只需遵循以下 JSON 格式：
+{
+  "user_id": "user_001",
+  "num_feedback": 0,
+  "diet": {"vegetarian_type": "flexible_vegetarian"},
+  "allergies": ["peanut", "shrimp"],
+  "region_preference": ["Asia", "Europe"],
+  "nutritional_goals": {
+    "calories": {"min": 400, "max": 3000},
+    "protein": {"min": 100, "max": 160}
+  },
+  "other_preferences": {
+    "preferred_main": ["chicken", "tofu"],
+    "disliked_main": ["lamb"],
+    "cooking_time_max": 40
+  }
+}
+冷启动过程通常需要 15～25 分钟（视机器性能而定）。
+----------------------------------------
+3. 数据下载
+----------------------------------------
+菜谱和食材映射等大文件会自动从 Hugging Face（iris314）下载并缓存到 `data/`，无需手动设置。
+----------------------------------------
+4. 快速上手示例
+----------------------------------------
+```python
+from main import recommend_recipes, load_recipes, get_feedback
+# 加载菜谱
+recipes_df = load_recipes()
+# 准备模拟检测输入
+payload = {"detected_ingredients": ["chicken", "milk", "flour"]}
+# 获取推荐结果
+top_recipes, user_parents, high_conf, low_conf = recommend_recipes(payload, "user_1", recipes_df, topk=5)
+# 提交反馈
+get_feedback("user_1", top_recipes.iloc[0].to_dict())

recipe_recommendation/src/__init__.py ADDED Viewed

File without changes

recipe_recommendation/src/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (195 Bytes). View file

recipe_recommendation/src/__pycache__/candidate.cpython-313.pyc ADDED Viewed

Binary file (15 kB). View file

recipe_recommendation/src/__pycache__/coldstart.cpython-313.pyc ADDED Viewed

Binary file (13.7 kB). View file

recipe_recommendation/src/__pycache__/embedding.cpython-313.pyc ADDED Viewed

Binary file (5.94 kB). View file

recipe_recommendation/src/__pycache__/feature.cpython-313.pyc ADDED Viewed

Binary file (8.57 kB). View file

recipe_recommendation/src/__pycache__/highlight.cpython-313.pyc ADDED Viewed

Binary file (4.47 kB). View file

recipe_recommendation/src/__pycache__/io.cpython-313.pyc ADDED Viewed

Binary file (2.02 kB). View file

recipe_recommendation/src/__pycache__/trainmodel.cpython-313.pyc ADDED Viewed

Binary file (10.4 kB). View file

recipe_recommendation/src/candidate.py ADDED Viewed

	@@ -0,0 +1,365 @@

+import pandas as pd
+import numpy as np
+from .feature import build_features
+from .io import load_ingredient_map
+import joblib
+# Load ingredient map globally to avoid repeated I/O
+INGREDIENT_MAP = load_ingredient_map()
+PARENTS = INGREDIENT_MAP["parents"]
+CHILDREN = INGREDIENT_MAP["children"]
+def extract_user_parents(user_ingredients):
+    """Map user's ingredients to parent categories"""
+    user_parents = set()
+    for ing in user_ingredients:
+        ing_lower = ing.lower().strip()
+        if ing_lower in CHILDREN:
+            parent = CHILDREN[ing_lower]["parent"]
+            user_parents.add(parent)
+        elif ing_lower in PARENTS:
+            user_parents.add(ing_lower)
+    return user_parents
+# def hard_filter(recipe, user_profile):
+#     diet = user_profile.get("diet", {}).get("vegetarian_type", "").lower()
+#     if diet == "vegan" and not recipe.get("is_vegan_safe", True):
+#         return False
+#     if diet in ["vegetarian", "flexible_vegetarian"] and not recipe.get("is_vegetarian_safe", True):
+#         return False
+#     return True
+def hard_filter(recipe: dict, user_profile: dict) -> bool:
+    """
+    Apply hard filters to determine whether a recipe matches the user's dietary profile.
+    Args:
+        recipe (dict): Recipe data containing attributes like 'calories', 'protein', 'is_vegan_safe', etc.
+        user_profile (dict): User preferences including diet type, nutritional goals, and disliked ingredients.
+    Returns:
+        bool: True if the recipe passes all hard filters, False otherwise.
+    """
+    # --- Dietary filter ---
+    diet = user_profile.get("diet", {}).get("vegetarian_type", "").lower()
+    if diet == "vegan" and not recipe.get("is_vegan_safe", True):
+        return False
+    if diet in ["vegetarian", "flexible_vegetarian"] and not recipe.get("is_vegetarian_safe", True):
+        return False
+    # --- Nutritional goal filter ---
+    nutritional_goals = user_profile.get("nutritional_goals", {})
+    # Calorie range filter
+    cal_range = nutritional_goals.get("calories", {})
+    cal_min = cal_range.get("min", 0)
+    cal_max = cal_range.get("max", 9999)
+    recipe_calories = recipe.get("calories", 0)
+    if not (cal_min <= recipe_calories <= cal_max):
+        return False
+    # Protein range filter
+    protein_range = nutritional_goals.get("protein", {})
+    pro_min = protein_range.get("min", 0)
+    pro_max = protein_range.get("max", 999)
+    recipe_protein = recipe.get("protein", 0)
+    if not (pro_min <= recipe_protein <= pro_max):
+        return False
+    # --- Disliked main ingredients filter ---
+    disliked_main = set(user_profile.get("other_preferences", {}).get("disliked_main", []))
+    if disliked_main:
+        recipe_main = recipe.get("main_parent", set())
+        if isinstance(recipe_main, list):
+            recipe_main = set(recipe_main)
+        elif not isinstance(recipe_main, set):
+            recipe_main = set()
+        # Exclude if any main ingredient is in the disliked list
+        if recipe_main & disliked_main:
+            return False
+    return True
+COARSE_WEIGHTS = {
+    "main_match_ratio": 1.0,
+    "staple_match_ratio": 0.3,
+    "other_match_ratio": 0.6,
+    "low_calorie_penalty": 0.2,
+    "preferred_course_overlap": 0.1,
+    "region_match": 0.8
+}
+def coarse_score(features, weights=COARSE_WEIGHTS):
+    score = 0.0
+    for key, w in weights.items():
+        if key in features:
+            score += w * features[key]
+    return score
+def coarse_rank_candidates(recipes, user_parents, user_profile, top_n=30000, weights=COARSE_WEIGHTS):
+    """
+    Stage 2: Coarse Ranking (NumPy vectorized implementation)
+    ---------------------------------------------------------
+    Quickly retrieves a subset of candidate recipes by computing
+    ingredient coverage ratios (main / staple / other) between
+    the user's pantry and the recipes using vectorized operations.
+    This function replaces the original Python loop version
+    for significant speedup during cold start and real-time ranking.
+    """
+    if not recipes:
+        return []
+    # === 1. Build parent vocabulary ===
+    # Extract all unique parent ingredients across main/staple/other fields.
+    all_parents = sorted({
+        p for r in recipes
+        for k in ["main_parent", "staple_parent", "other_parent"]
+        for p in (r.get(k) or [])
+    })
+    parent_index = {p: i for i, p in enumerate(all_parents)}
+    num_recipes = len(recipes)
+    num_parents = len(all_parents)
+    # === 2. Construct multi-hot matrices for main, staple, other ===
+    # Each row corresponds to a recipe; each column to a parent ingredient.
+    main_mat   = np.zeros((num_recipes, num_parents), dtype=np.uint8)
+    staple_mat = np.zeros((num_recipes, num_parents), dtype=np.uint8)
+    other_mat  = np.zeros((num_recipes, num_parents), dtype=np.uint8)
+    for i, r in enumerate(recipes):
+        for p in r.get("main_parent", []):
+            if p in parent_index:
+                main_mat[i, parent_index[p]] = 1
+        for p in r.get("staple_parent", []):
+            if p in parent_index:
+                staple_mat[i, parent_index[p]] = 1
+        for p in r.get("other_parent", []):
+            if p in parent_index:
+                other_mat[i, parent_index[p]] = 1
+    # === 3. Encode user pantry as a binary mask ===
+    user_mask = np.zeros(num_parents, dtype=np.uint8)
+    for p in user_parents:
+        if p in parent_index:
+            user_mask[parent_index[p]] = 1
+    # === 4. Compute ingredient match ratios in batch ===
+    # main_ratio = (# of matched main ingredients) / (# of total main ingredients)
+    main_total   = main_mat.sum(axis=1)
+    staple_total = staple_mat.sum(axis=1)
+    other_total  = other_mat.sum(axis=1)
+    main_match   = (main_mat @ user_mask)
+    staple_match = (staple_mat @ user_mask)
+    other_match  = (other_mat @ user_mask)
+    main_ratio   = main_match   / np.maximum(main_total, 1)
+    staple_ratio = staple_match / np.maximum(staple_total, 1)
+    other_ratio  = other_match  / np.maximum(other_total, 1)
+    # === 5. Additional coarse ranking signals ===
+    # Low-calorie preference & preferred cuisine overlap
+    calories = np.array([r.get("calories", 0) for r in recipes], dtype=float)
+    calorie_threshold = user_profile.get("calorie_threshold", 9999)
+    low_calorie_penalty = (calories <= calorie_threshold).astype(float)
+    preferred_course_types = set(user_profile.get("preferred_course_types", []))
+    preferred_overlap = np.array([
+        len(set(r.get("cuisine_attr", [])) & preferred_course_types)
+        for r in recipes
+    ], dtype=float)
+    # Region preference matching
+    preferred_regions = set(user_profile.get("region_preference", []))
+    region_match = np.array([
+        1.0 if any(region in preferred_regions for region in
+                (r.get("region", []) if isinstance(r.get("region"), (list, set))
+                    else [r.get("region", "")]))
+        else 0.0
+        for r in recipes
+    ], dtype=float)
+    # === 6. Compute coarse ranking scores ===
+    scores = (
+        weights["main_match_ratio"]   * main_ratio +
+        weights["staple_match_ratio"] * staple_ratio +
+        weights["other_match_ratio"]  * other_ratio +
+        weights["low_calorie_penalty"] * low_calorie_penalty +
+        weights["preferred_course_overlap"] * preferred_overlap +
+        weights.get("region_match", 0) * region_match
+    )
+    # === 7. Select top-N candidates ===
+    valid_idx = np.where(scores > 0)[0]
+    if valid_idx.size == 0:
+        return []
+    scores_valid = scores[valid_idx]
+    topk = min(top_n, valid_idx.size)
+    # Optional dynamic thresholding: keep candidates with score >= 50% of max
+    max_score = scores_valid.max()
+    keep_mask = scores_valid >= max_score * 0.5
+    keep_idx = valid_idx[keep_mask]
+    if keep_idx.size == 0:
+        return []
+    order = np.argsort(scores[keep_idx])[::-1]
+    top_idx = keep_idx[order[:topk]]
+    # Return the original recipe dicts corresponding to the top candidates
+    return [recipes[i] for i in top_idx]
+def rule_generate_candidates(df, user_parents, user_profile):
+    """
+    Step 3: Rule-based reranking of coarse candidates.
+    Uses all available features (except vegan/vegetarian filters, which were applied in Step 1)
+    to compute a weighted rule-based score for each recipe.
+    """
+    def score(row):
+        # Build recipe_dict for feature extraction
+        recipe_dict = {
+            "main": row.get("main_parent", set()),
+            "staple": row.get("staple_parent", set()),
+            "other": row.get("other_parent", set()),
+            "seasoning": row.get("seasoning_parent", set()),
+            "matched_main": len(row.get("main_parent", set()) & set(user_parents)),
+            "matched_staple": len(row.get("staple_parent", set()) & set(user_parents)),
+            "matched_other": len(row.get("other_parent", set()) & set(user_parents)),
+            "calories": row.get("calories", 0),
+            "protein": row.get("protein", 0),
+            "fat": row.get("fat", 0),
+            "region": row.get("region", ""),
+            "cuisine_attr": row.get("cuisine_attr", []),
+            "ingredients": row.get("ingredients", []),
+            "minutes": row.get("minutes", None),
+        }
+        # Extract rule features
+        feats = build_features(recipe_dict, user_profile)
+        # Compute rule-based score
+        score = 0.0
+        # Ingredient match ratios
+        # Main ingredients are weighted most heavily
+        score += 2.0 * feats["main_match_ratio"]
+        score += 1.0 * feats["staple_match_ratio"]
+        score += 1.0 * feats["other_match_ratio"]
+        # Nutrition preferences
+        # Low calorie preference
+        if user_profile.get("low_calorie", False):
+            if feats["low_calorie_penalty"]:
+                score += 0.5
+        # High protein preference
+        if user_profile.get("high_protein", False) and feats["protein_ratio"] > 0.25:
+            score += 0.3
+        # Low fat preference (penalty if fat ratio is too high)
+        if user_profile.get("low_fat", False) and feats["fat_ratio"] > 0.35:
+            score -= 0.3
+        # Region / cuisine / main-type preferences
+        score += 0.5 * feats["region_match"]
+        score += 0.4 * feats["preferred_course_overlap"]
+        score += 0.3 * feats["preferred_main_overlap"]
+        # Cooking time preference
+        score += 0.3 * feats["within_cooking_time"]
+        # Missing ingredients penalty
+        # Minor penalty for missing main ingredients (after coarse filtering this is usually small)
+        score -= 0.2 * feats["missing_main_count"]
+        return max(score, 0.0)
+    # Apply scoring over the coarse candidate DataFrame
+    df = df.copy()
+    df["match_score"] = df.apply(score, axis=1)
+    df = df[df["match_score"] > 0]
+    if df.empty:
+        return df
+    df = df.sort_values("match_score", ascending=False).reset_index(drop=True)
+    return df
+def ml_generate_candidates(coarse_candidates, user_parents, user_profile, model_path, topk=5):
+    """
+    Step 3: ML-based reranking (directly after Step 2).
+    Instead of rule-based prefiltering, use the coarse-ranked candidates (Step 2 output),
+    build features in the same format as training, and apply the trained ML model to rerank.
+    """
+    # Handle empty input
+    if coarse_candidates is None or len(coarse_candidates) == 0:
+        print("No candidates provided for ML reranking.")
+        return pd.DataFrame()
+    # If input is a list of dicts (from coarse_rank_candidates), convert to DataFrame
+    if isinstance(coarse_candidates, list):
+        df = pd.DataFrame(coarse_candidates)
+    else:
+        df = coarse_candidates.copy()
+    if df.empty:
+        print("Coarse candidates DataFrame is empty.")
+        return df
+    # Load trained model
+    model = joblib.load(model_path)
+    # Build feature DataFrame
+    feature_rows = []
+    for _, row in df.iterrows():
+        recipe_dict = {
+            "main": row.get("main_parent", set()),
+            "staple": row.get("staple_parent", set()),
+            "other": row.get("other_parent", set()),
+            "seasoning": row.get("seasoning_parent", set()),
+            "matched_main": len(row.get("main_parent", set()) & set(user_parents)),
+            "matched_staple": len(row.get("staple_parent", set()) & set(user_parents)),
+            "matched_other": len(row.get("other_parent", set()) & set(user_parents)),
+            "calories": row.get("calories", 0),
+            "protein": row.get("protein", 0),
+            "fat": row.get("fat", 0),
+            "region": row.get("region", ""),
+            "cuisine_attr": row.get("cuisine_attr", []),
+            "ingredients": row.get("ingredients", []),
+            "minutes": row.get("minutes", None),
+        }
+        feats = build_features(recipe_dict, user_profile)
+        feature_rows.append(feats)
+    feature_df = pd.DataFrame(feature_rows)
+    # Predict ML scores
+    if hasattr(model, "predict_proba"):
+        df["ml_score"] = model.predict_proba(feature_df)[:, 1]
+    else:
+        df["ml_score"] = model.predict(feature_df)
+    # normalize to 0-1
+    if len(df) > 0 and df["ml_score"].max() > df["ml_score"].min():
+        df["ml_score"] = (df["ml_score"] - df["ml_score"].min()) / (df["ml_score"].max() - df["ml_score"].min())
+    # Sort by ML score and return top-k candidates
+    return df.sort_values("ml_score", ascending=False).head(topk).reset_index(drop=True)

recipe_recommendation/src/coldstart.py ADDED Viewed

	@@ -0,0 +1,279 @@

+import os
+import ast
+import json
+import random
+import pandas as pd
+import numpy as np
+from tqdm import tqdm
+import warnings
+from .candidate import coarse_rank_candidates, hard_filter, rule_generate_candidates
+from .feature import build_features
+from .io import load_recipes_csv, load_ingredient_map
+RECIPES_PATH = load_recipes_csv()
+INGREDIENT_MAP = load_ingredient_map()
+PARENTS = INGREDIENT_MAP["parents"]
+CHILDREN = INGREDIENT_MAP["children"]
+def parse_list(x):
+    """Convert a stringified list into a Python list safely."""
+    if pd.isna(x) or x == "":
+        return []
+    if isinstance(x, list):
+        return x
+    try:
+        return ast.literal_eval(x)
+    except Exception:
+        return []
+def parse_set(x):
+    """Convert a stringified collection into a Python set safely."""
+    if pd.isna(x) or x == "":
+        return set()
+    if isinstance(x, set):
+        return x
+    if isinstance(x, (list, tuple)):
+        return set(x)
+    if isinstance(x, str):
+        try:
+            v = ast.literal_eval(x)
+            if isinstance(v, (list, tuple, set)):
+                return set(v)
+            return {v}
+        except Exception:
+            return {x.strip()}
+    return {x}
+def _parents_pool_from_df(df: pd.DataFrame):
+    cols = ["main_parent", "staple_parent", "other_parent", "seasoning_parent"]
+    pool = set()
+    for c in cols:
+        if c in df.columns:
+            for s in df[c]:
+                pool |= set(s) if isinstance(s, (set, list, tuple)) else set()
+    return sorted(pool)
+def sample_user_parents(parents_pool,
+                        user_profile=None,
+                        prev_inventory=None,
+                        min_items=3, max_items=10,
+                        keep_ratio=0.6, reset_interval=20, round_idx=0):
+    liked = set((user_profile or {}).get("other_preferences", {}).get("preferred_main", []))
+    disliked = set((user_profile or {}).get("other_preferences", {}).get("disliked_main", []))
+    forbidden = set((user_profile or {}).get("forbidden_parents", [])) | disliked
+    pool, weights = [], []
+    for p in parents_pool:
+        if p in forbidden:
+            continue
+        w = 3.0 if p in liked else 1.0
+        pool.append(p); weights.append(w)
+    if not pool:
+        pool, weights = parents_pool[:], [1.0] * len(parents_pool)
+    inventory = set()
+    force_reset = (round_idx % reset_interval == 0)
+    if prev_inventory and not force_reset:
+        prev_list = list(prev_inventory); random.shuffle(prev_list)
+        keep_k = max(0, int(len(prev_list) * keep_ratio))
+        inventory |= set(prev_list[:keep_k])
+    k = random.randint(min_items, max_items)
+    remain = max(0, k - len(inventory))
+    for _ in range(min(remain, len(pool))):
+        idx = random.choices(range(len(pool)), weights=weights, k=1)[0]
+        inventory.add(pool[idx])
+    return list(inventory)
+def _weighted_pick3(indexes, scores, temperature=1.0):
+    idxs = list(indexes)
+    scs = np.array(scores, dtype=float)
+    if np.any(scs < 0):
+        scs = scs - scs.min()
+    if scs.sum() == 0:
+        scs = np.ones_like(scs)
+    picks = []
+    for _ in range(min(3, len(idxs))):
+        probs = np.exp(scs / max(temperature, 1e-6))
+        probs = probs / probs.sum()
+        choice = np.random.choice(len(idxs), p=probs)
+        picks.append(idxs[choice])
+        idxs.pop(choice)
+        scs = np.delete(scs, choice)
+        if len(idxs) == 0:
+            break
+    return picks
+# ---------- Main cold-start ----------
+# ---------- Main cold-start ----------
+def cold_start_ranker(user_id: str,
+                      n_rounds: int = 10000,
+                      topn_coarse: int = 5000,
+                      topk_rule: int = 5,
+                      batch_size: int = 5000,
+                      switch_interval: int = 100):
+    """
+    Cold-start data generation for learning-to-rank.
+    Top-5 selection prioritizes user pantry coverage deterministically:
+    1. Fully covered recipes first (missing_count == 0)
+    2. Then few missing (esp. staple/other)
+    3. Heavy penalty for missing main ingredients.
+    """
+    base_dir = os.path.join("user_data", user_id)
+    os.makedirs(base_dir, exist_ok=True)
+    profile_path  = os.path.join(base_dir, "user_profile.json")
+    features_path = os.path.join(base_dir, "user_features_rank.csv")
+    if os.path.exists(features_path):
+        print(f"[cold_start] Features already exist at {features_path}")
+        return features_path
+    with open(profile_path, "r", encoding="utf-8") as f:
+        user_profile = json.load(f)
+    # Load and parse recipes
+    df_all = pd.read_csv(RECIPES_PATH)
+    to_set = ["main_parent", "staple_parent", "other_parent", "seasoning_parent", "cuisine_attr"]
+    to_list = ["ingredients"]
+    for c in to_set:
+        if c in df_all.columns:
+            df_all[c] = df_all[c].apply(parse_set)
+    for c in to_list:
+        if c in df_all.columns:
+            df_all[c] = df_all[c].apply(parse_list)
+    # Step 1 hard filter
+    if hard_filter is not None:
+        try:
+            before = len(df_all)
+            mask = df_all.apply(lambda r: hard_filter(r.to_dict(), user_profile), axis=1)
+            df_all = df_all[mask]
+            after = len(df_all)
+            print(f"[cold_start] Step1 hard filter applied: {before} -> {after}")
+        except Exception as e:
+            warnings.warn(f"[cold_start] hard_filter failed, skip. err={e}")
+    n_chunks = (len(df_all) // batch_size) + 1
+    chunks = np.array_split(df_all, n_chunks)
+    parents_pool = _parents_pool_from_df(df_all)
+    rows = []
+    prev_inventory = None
+    for i in tqdm(range(n_rounds), desc="Cold-start rounds"):
+        chunk_id = (i // switch_interval) % n_chunks
+        df_chunk = chunks[chunk_id].copy()
+        # pantry sampling
+        user_parents = sample_user_parents(
+            parents_pool,
+            user_profile=user_profile,
+            prev_inventory=prev_inventory,
+            round_idx=i
+        )
+        prev_inventory = user_parents
+        # Step 2: coarse recall
+        coarse_list = coarse_rank_candidates(
+            recipes=df_chunk.to_dict(orient="records"),
+            user_parents=user_parents,
+            user_profile=user_profile,
+            top_n=min(topn_coarse, len(df_chunk))
+        )
+        if not coarse_list:
+            continue
+        coarse_df = pd.DataFrame(coarse_list)
+        # Step 3: rule rerank → Top-5 candidates (just for selecting the 5)
+        rule_df = rule_generate_candidates(
+            coarse_df,
+            user_parents=user_parents,
+            user_profile=user_profile
+        )
+        if rule_df.empty or len(rule_df) < topk_rule:
+            continue
+        top5 = rule_df.head(topk_rule).copy()
+        # ===== New deterministic scoring with main priority =====
+        user_set = set(user_parents)
+        weighted_scores = []
+        for idx, row in top5.iterrows():
+            main_set   = set(row.get("main_parent", set()))
+            staple_set = set(row.get("staple_parent", set()))
+            other_set  = set(row.get("other_parent", set()))
+            main_missing   = len(main_set   - user_set)
+            staple_missing = len(staple_set - user_set)
+            other_missing  = len(other_set  - user_set)
+            weighted_missing = 10 * main_missing + 2 * staple_missing + 1 * other_missing
+            total_missing = main_missing + staple_missing + other_missing
+            weighted_scores.append((idx, weighted_missing, total_missing))
+        sorted_pairs = sorted(weighted_scores, key=lambda x: (x[1], x[2]))
+        picked_idxs = [idx for idx, _, _ in sorted_pairs[:3]]
+        # relevance 3 / 2 / 1
+        labels = {idx: 0 for idx in top5.index}
+        if len(picked_idxs) > 0:
+            labels[picked_idxs[0]] = 3
+        if len(picked_idxs) > 1:
+            labels[picked_idxs[1]] = 2
+        if len(picked_idxs) > 2:
+            labels[picked_idxs[2]] = 1
+        # build features for all 5 candidates
+        for idx, row in top5.iterrows():
+            up = set(user_parents)
+            main_set   = set(row.get("main_parent", set()))
+            staple_set = set(row.get("staple_parent", set()))
+            other_set  = set(row.get("other_parent", set()))
+            recipe_dict = {
+                "main": main_set,
+                "staple": staple_set,
+                "other": other_set,
+                "seasoning": set(row.get("seasoning_parent", set())),
+                "matched_main":   len(main_set   & up),
+                "matched_staple": len(staple_set & up),
+                "matched_other":  len(other_set  & up),
+                "calories": row.get("calories", 0),
+                "protein":  row.get("protein", 0),
+                "fat":      row.get("fat", 0),
+                "region": row.get("region", ""),
+                "cuisine_attr": row.get("cuisine_attr", []),
+                "ingredients": row.get("ingredients", []),
+                "minutes": row.get("minutes", None),
+            }
+            feats = build_features(recipe_dict, user_profile)
+            feats["relevance"] = float(labels[idx])
+            feats["qid"] = int(i)
+            rows.append(feats)
+    out = pd.DataFrame(rows)
+    valid_qids = out.groupby("qid").size()
+    keep_qids = valid_qids[valid_qids > 1].index
+    out = out[out["qid"].isin(keep_qids)].reset_index(drop=True)
+    out_path = os.path.join("user_data", user_id, "user_features_rank.csv")
+    out.to_csv(out_path, index=False)
+    print(f"[cold_start] Saved {len(out)} rows to {out_path}")
+    return out_path
+if __name__ == "__main__":
+    cold_start_ranker(
+        user_id="user_1",
+        n_rounds=10000,
+        topn_coarse=20000,
+        topk_rule=5,
+        coverage_penalty=0.15,
+        temperature=0.5
+    )

recipe_recommendation/src/embedding.py ADDED Viewed

	@@ -0,0 +1,100 @@

+import os
+import json
+import numpy as np
+from sklearn.metrics.pairwise import cosine_similarity
+def profile_to_embedding(profile):
+    """
+    Convert a normalized user profile into a fixed-length numeric embedding.
+    Embedding structure:
+    [diet (3)] + [allergies (6)] + [region (6)] +
+    [nutritional goals (4)] + [preferred_main (8)] + [cooking_time (1)]
+    Total dim ≈ 28
+    """
+    vecs = []
+    # 1. Diet (one-hot)
+    diet_types = ["vegetarian", "flexible", "non_vegetarian"]
+    diet_vec = np.zeros(len(diet_types))
+    diet_value = profile.get("diet", {}).get("vegetarian_type", "flexible")
+    if diet_value in diet_types:
+        diet_vec[diet_types.index(diet_value)] = 1
+    vecs.append(diet_vec)
+    # 2. Allergies (multi-hot)
+    allergy_vocab = ["milk", "gluten", "peanut", "shrimp", "egg", "soy"]
+    allergies = set(profile.get("allergies", []))
+    allergy_vec = np.array([1 if a in allergies else 0 for a in allergy_vocab])
+    vecs.append(allergy_vec)
+    # 3. Region preferences (multi-hot)
+    region_vocab = ["North America", "Latin America", "Europe", "Asia", "Middle East", "Africa"]
+    regions = set(profile.get("region_preference", []))
+    region_vec = np.array([1 if r in regions else 0 for r in region_vocab])
+    vecs.append(region_vec)
+    # 4. Nutritional goals (normalized)
+    ng = profile.get("nutritional_goals", {})
+    cal = ng.get("calories", {})
+    pro = ng.get("protein", {})
+    cal_min = cal.get("min", 0) / 4000
+    cal_max = min(cal.get("max", 9999), 4000) / 4000
+    pro_min = pro.get("min", 0) / 300
+    pro_max = min(pro.get("max", 999), 300) / 300
+    vecs.append(np.array([cal_min, cal_max, pro_min, pro_max]))
+    # 5. Preferred main ingredients (multi-hot)
+    main_vocab = ["chicken", "tofu", "beef", "salmon", "eggs", "pork", "beans", "mushroom"]
+    mains = set(profile.get("other_preferences", {}).get("preferred_main", []))
+    main_vec = np.array([1 if m in mains else 0 for m in main_vocab])
+    vecs.append(main_vec)
+    # 6. Cooking time max (normalized to [0,1], assume 120 min upper bound)
+    t = profile.get("other_preferences", {}).get("cooking_time_max")
+    t_vec = np.array([min(t / 120, 1)]) if t is not None else np.array([0])
+    vecs.append(t_vec)
+    return np.concatenate(vecs)
+def profile_similarity(profile_a, profile_b):
+    """Compute cosine similarity between two user profiles."""
+    emb_a = profile_to_embedding(profile_a).reshape(1, -1)
+    emb_b = profile_to_embedding(profile_b).reshape(1, -1)
+    return cosine_similarity(emb_a, emb_b)[0, 0]
+def find_most_similar_user(target_user_id, user_data_dir="recipe_recommendation/user_data", threshold=0.85):
+    """
+    Find the most similar existing user based on profile embeddings.
+    Returns (best_match_user_id, similarity_score) or (None, -1) if no match.
+    """
+    target_profile_path = os.path.join(user_data_dir, target_user_id, "user_profile.json")
+    if not os.path.exists(target_profile_path):
+        raise FileNotFoundError(f"[embedding] No profile found for user {target_user_id}")
+    with open(target_profile_path, "r", encoding="utf-8") as f:
+        target_profile = json.load(f)
+    target_emb = profile_to_embedding(target_profile).reshape(1, -1)
+    best_match, best_score = None, -1
+    for uid in os.listdir(user_data_dir):
+        if uid == target_user_id:
+            continue
+        profile_path = os.path.join(user_data_dir, uid, "user_profile.json")
+        if not os.path.exists(profile_path):
+            continue
+        with open(profile_path, "r", encoding="utf-8") as f:
+            other_profile = json.load(f)
+        other_emb = profile_to_embedding(other_profile).reshape(1, -1)
+        sim = cosine_similarity(target_emb, other_emb)[0, 0]
+        if sim > best_score:
+            best_match, best_score = uid, sim
+    if best_match and best_score >= threshold:
+        print(f"[embedding] Found similar user: {best_match} (similarity={best_score:.3f})")
+        return best_match, best_score
+    return None, -1

recipe_recommendation/src/feature.py ADDED Viewed

	@@ -0,0 +1,176 @@

+import json
+from .io import load_ingredient_map
+import numpy as np
+# Load ingredient map globally to avoid repeated I/O
+INGREDIENT_MAP = load_ingredient_map()
+PARENTS = INGREDIENT_MAP["parents"]
+CHILDREN = INGREDIENT_MAP["children"]
+def is_recipe_vegetarian_safe(ingredients: list[str], veg_type: str) -> bool:
+    """
+    Check if the recipe is safe for a given dietary type.
+    Supported veg_type: "vegan", "vegetarian", "flexible_vegetarian", "" (none).
+    """
+    for ing in ingredients:
+        ing_lower = ing.strip().lower()
+        if ing_lower in CHILDREN:
+            info = CHILDREN[ing_lower]
+        elif ing_lower in PARENTS:
+            info = PARENTS[ing_lower]
+        else:
+            # If the ingredient is not found in the map, treat it as safe by default.
+            continue
+        if veg_type == "vegan" and not info.get("vegan_safe", True):
+            return False
+        if veg_type == "vegetarian" and not info.get("vegetarian_safe", True):
+            return False
+        if veg_type == "flexible_vegetarian":
+            # Flexible vegetarians allow most ingredients except explicit meat.
+            # Here, we can use vegetarian_safe as a proxy for flexibility.
+            if not info.get("vegetarian_safe", True):
+                return False
+    return True
+def build_features(recipe: dict, user_profile: dict) -> dict:
+    """
+    Build a feature dictionary for ML ranker and rule-based scoring.
+    All features are numeric scalars or counts.
+    """
+    features = {}
+    # Ingredient matching ratios
+    total_main = len(recipe.get("main", []))
+    total_other = len(recipe.get("other", []))
+    total_staple = len(recipe.get("staple", []))
+    features["main_match_ratio"] = recipe.get("matched_main", 0) / max(total_main, 1)
+    features["other_match_ratio"] = recipe.get("matched_other", 0) / max(total_other, 1)
+    features["staple_match_ratio"] = recipe.get("matched_staple", 0) / max(total_staple, 1)
+    features["missing_main_count"] = total_main - recipe.get("matched_main", 0)
+    features["missing_other_count"] = total_other - recipe.get("matched_other", 0)
+    features["missing_staple_count"] = total_staple - recipe.get("matched_staple", 0)
+    # Nutrition information
+    calories = recipe.get("calories", 0)
+    protein = recipe.get("protein", 0)
+    fat = recipe.get("fat", 0)
+    features["calories"] = calories
+    features["protein"] = protein
+    features["fat"] = fat
+    features["protein_ratio"] = protein / max(calories, 1)
+    features["fat_ratio"] = fat / max(calories, 1)
+    # Regional preference
+    recipe_region = recipe.get("region", "")
+    if isinstance(recipe_region, set):
+        features["region_match"] = int(any(
+            r in user_profile.get("preferred_regions", []) for r in recipe_region
+        ))
+    else:
+        features["region_match"] = int(
+            recipe_region in user_profile.get("preferred_regions", [])
+        )
+    # Diet constraints
+    ingredients_all = recipe.get("ingredients", [])
+    # Vegan-safe check (absolute, independent of user)
+    features["is_vegan_safe"] = int(is_recipe_vegetarian_safe(ingredients_all, "vegan"))
+    # Vegetarian-safe check (absolute, independent of user)
+    features["is_vegetarian_safe_absolute"] = int(
+        is_recipe_vegetarian_safe(ingredients_all, "vegetarian")
+    )
+    # Flexible vegetarian-safe check (absolute, independent of user)
+    features["is_flexible_safe_absolute"] = int(
+        is_recipe_vegetarian_safe(ingredients_all, "flexible_vegetarian")
+    )
+    # User diet safety (depends on user_profile)
+    veg_type = (user_profile.get("diet", {}).get("vegetarian_type", "") or "").lower()
+    features["is_user_diet_safe"] = int(is_recipe_vegetarian_safe(ingredients_all, veg_type))
+    # Calorie preference
+    calorie_threshold = user_profile.get("calorie_threshold", 9999)
+    features["low_calorie_penalty"] = int(calories <= calorie_threshold)
+    # Main ingredient preference
+    preferred_main = set(user_profile.get("other_preferences", {}).get("preferred_main", []))
+    recipe_main = set(recipe.get("main", []))
+    features["preferred_main_overlap"] = len(recipe_main & preferred_main)
+    # Course type preference
+    # e.g. user may prefer 'main-dish' or 'desserts'
+    recipe_types = set(recipe.get("cuisine_attr", []))
+    preferred_types = set(user_profile.get("preferred_course_types", []))
+    features["preferred_course_overlap"] = len(recipe_types & preferred_types)
+    # Cooking time preference
+    cooking_time_max = user_profile.get("other_preferences", {}).get("cooking_time_max", None)
+    if cooking_time_max:
+        features["within_cooking_time"] = int(recipe.get("minutes", 9999) <= cooking_time_max)
+    else:
+        features["within_cooking_time"] = 1
+    return features
+def build_cluster_features(candidates):
+    """
+    Build simple ingredient + cuisine based feature vectors for KMeans clustering.
+    This is separate from model training features.
+    Args:
+        candidates (list[dict]): list of recipe dicts.
+    Returns:
+        np.ndarray: feature matrix (num_candidates, num_features)
+    """
+    # 1. Collect vocabulary for ingredients and cuisine
+    all_main = set()
+    all_staple = set()
+    all_other = set()
+    all_cuisine = set()
+    for r in candidates:
+        all_main.update(r.get("main_parent", []) or [])
+        all_staple.update(r.get("staple_parent", []) or [])
+        all_other.update(r.get("other_parent", []) or [])
+        all_cuisine.update(r.get("cuisine_attr", []) or [])
+    main_vocab = sorted(all_main)
+    staple_vocab = sorted(all_staple)
+    other_vocab = sorted(all_other)
+    cuisine_vocab = sorted(all_cuisine)
+    # 2. Build index map
+    main_idx = {p: i for i, p in enumerate(main_vocab)}
+    staple_idx = {p: i + len(main_vocab) for i, p in enumerate(staple_vocab)}
+    other_idx = {p: i + len(main_vocab) + len(staple_vocab) for i, p in enumerate(other_vocab)}
+    cuisine_idx = {p: i + len(main_vocab) + len(staple_vocab) + len(other_vocab)
+                   for i, p in enumerate(cuisine_vocab)}
+    dim = len(main_vocab) + len(staple_vocab) + len(other_vocab) + len(cuisine_vocab)
+    X = np.zeros((len(candidates), dim), dtype=np.uint8)
+    # 3. Fill feature matrix
+    for i, r in enumerate(candidates):
+        for p in r.get("main_parent", []) or []:
+            if p in main_idx:
+                X[i, main_idx[p]] = 1
+        for p in r.get("staple_parent", []) or []:
+            if p in staple_idx:
+                X[i, staple_idx[p]] = 1
+        for p in r.get("other_parent", []) or []:
+            if p in other_idx:
+                X[i, other_idx[p]] = 1
+        for p in r.get("cuisine_attr", []) or []:
+            if p in cuisine_idx:
+                X[i, cuisine_idx[p]] = 1
+    return X

recipe_recommendation/src/highlight.py ADDED Viewed

	@@ -0,0 +1,91 @@

+import pandas as pd
+from sklearn.cluster import KMeans
+from sklearn.preprocessing import StandardScaler
+import numpy as np
+def print_candidates(candidates, user_parents, topk=10):
+    shown = 0
+    max_score = candidates['match_score'].max()
+    min_score = candidates['match_score'].min()
+    for _, row in candidates.head(topk).iterrows():
+        scaled_score = 100 * row['match_score'] / (max_score + 1e-9)
+        print(f"{row['name']} (score {scaled_score:.1f}%)")
+        # ----- Region -----
+        region = row.get("region", None)
+        if pd.notna(region) and isinstance(region, str) and region.strip() and region.lower() != "unavailable":
+            print(f"  region: {region}")
+        # ----- Cuisine Attributes -----
+        cuisine = row.get("cuisine_attr", None)
+        if cuisine is not None and not (isinstance(cuisine, float) and pd.isna(cuisine)):
+            # Convert set to list for printing
+            if isinstance(cuisine, set):
+                cuisine = list(cuisine)
+            elif isinstance(cuisine, str):
+                cuisine = [cuisine]
+            if isinstance(cuisine, list) and len(cuisine) > 0:
+                print(f"  cuisine: {', '.join(cuisine)}")
+        # ----- Nutrition -----
+        print(f"  calories: {row.get('calories', 'N/A')}")
+        # ----- Ingredient Marking -----
+        def mark_list(lst):
+            return [("✅ " + ing) if ing in user_parents else ("❌ " + ing) for ing in lst]
+        print(f"  staple:    {mark_list(row.get('staple_parent', []))}")
+        print(f"  main:      {mark_list(row.get('main_parent', []))}")
+        print(f"  seasoning: {row.get('seasoning_parent', [])}")
+        print(f"  other:     {mark_list(row.get('other_parent', []))}")
+        print("-" * 40)
+        shown += 1
+def diversify_topk_with_min_clusters(
+    ranked_candidates,
+    feature_matrix,
+    top_k=5,
+    n_clusters=20,
+    min_clusters=3,
+    random_state=42
+):
+    """
+    Diversify top-k displayed recipes using KMeans clustering.
+    Ensures that the final top_k contains at least `min_clusters` distinct clusters.
+    """
+    if len(ranked_candidates) == 0:
+        return []
+    n_clusters = min(n_clusters, len(ranked_candidates))
+    scaler = StandardScaler()
+    X_scaled = scaler.fit_transform(feature_matrix)
+    # KMeans clustering
+    kmeans = KMeans(n_clusters=n_clusters, n_init='auto', random_state=random_state)
+    cluster_ids = kmeans.fit_predict(X_scaled)
+    # Step 1: pick candidates from distinct clusters until min_clusters reached
+    picked = []
+    picked_clusters = set()
+    for i, c in enumerate(cluster_ids):
+        if c not in picked_clusters:
+            picked.append(ranked_candidates[i])
+            picked_clusters.add(c)
+        if len(picked_clusters) >= min_clusters or len(picked) >= top_k:
+            break
+    # Step 2: fill the rest purely by rank order
+    if len(picked) < top_k:
+        for i, c in enumerate(cluster_ids):
+            if ranked_candidates[i] not in picked:
+                picked.append(ranked_candidates[i])
+            if len(picked) >= top_k:
+                break
+    return picked

recipe_recommendation/src/io.py ADDED Viewed

	@@ -0,0 +1,37 @@

+import os
+import json
+from huggingface_hub import hf_hub_download
+# Hugging Face ID
+REPO_ID = "Iris314/recipe-cleaned"
+ROOT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
+DATA_DIR = os.path.join(ROOT_DIR, "data")
+os.makedirs(DATA_DIR, exist_ok=True)
+def download_file(filename: str) -> str:
+    local_path = os.path.join(DATA_DIR, filename)
+    if not os.path.exists(local_path):
+        print(f"Downloading {filename} from Hugging Face Hub...")
+        hf_hub_download(
+            repo_id=REPO_ID,
+            filename=filename,
+            repo_type="dataset",
+            local_dir=DATA_DIR,
+            local_dir_use_symlinks=False
+        )
+    else:
+        print(f"{filename} already exists locally.")
+    return local_path
+def load_recipes_csv() -> str:
+    return download_file("recipes.csv")
+def load_ingredient_map() -> dict:
+    path = download_file("ingredient_map.data")
+    with open(path, "r", encoding="utf-8") as f:
+        return json.load(f)

recipe_recommendation/src/trainmodel.py ADDED Viewed

	@@ -0,0 +1,237 @@

+import os
+import joblib
+import warnings
+import numpy as np
+import pandas as pd
+from typing import List, Tuple, Sequence, Optional
+from xgboost import XGBRanker
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import ndcg_score
+from pandas.api.types import is_numeric_dtype
+# ----------------------------- Helpers -----------------------------
+def _pick_feature_cols(df: pd.DataFrame, drop_cols: Sequence[str]) -> List[str]:
+    """
+    Pick numeric feature columns robustly, excluding drop_cols.
+    Uses pandas is_numeric_dtype to correctly include nullable ints/floats/bools.
+    """
+    cols = []
+    for c in df.columns:
+        if c in drop_cols:
+            continue
+        if is_numeric_dtype(df[c]):
+            cols.append(c)
+    return cols
+def _sort_and_pack_by_qid(
+    X: pd.DataFrame, y: pd.Series, qid: pd.Series, feature_cols: List[str]
+) -> Tuple[pd.DataFrame, np.ndarray, List[int], np.ndarray]:
+    """
+    Sort rows by qid so that group sizes match the sample order.
+    Returns:
+        X_sorted, y_sorted, groups, qid_sorted (aligned with X_sorted/y_sorted)
+    """
+    packed = X.copy()
+    packed["_label"] = y.values
+    packed["_qid"] = qid.values
+    packed = packed.sort_values("_qid").reset_index(drop=True)
+    groups = packed.groupby("_qid").size().tolist()
+    X_sorted = packed[feature_cols].copy()
+    y_sorted = packed["_label"].astype(float).values
+    qid_sorted = packed["_qid"].values
+    return X_sorted, y_sorted, groups, qid_sorted
+def _eval_mean_ndcg(
+    model: XGBRanker,
+    X_val: pd.DataFrame,
+    y_val,              # can be np.ndarray or pd.Series
+    qid_val,            # aligned with X_val/y_val
+    ks: Sequence[int] = (5, 10),
+) -> dict:
+    """
+    Compute mean NDCG@k for each k in ks over validation queries.
+    Accepts numpy arrays or pandas Series.
+    """
+    # Try to respect early-stopping best iteration if available (xgboost>=2.0)
+    try:
+        preds = model.predict(X_val, iteration_range=(0, model.best_iteration + 1))
+    except Exception:
+        preds = model.predict(X_val)
+    y_arr = np.asarray(y_val)
+    q_arr = np.asarray(qid_val)
+    out = {}
+    for k in ks:
+        ndcgs = []
+        for q in np.unique(q_arr):
+            mask = (q_arr == q)
+            if mask.sum() < 2:
+                continue
+            ndcgs.append(ndcg_score([y_arr[mask]], [preds[mask]], k=k))
+        out[f"NDCG@{k}"] = float(np.mean(ndcgs)) if ndcgs else 0.0
+    return out
+# ----------------------------- Main Trainer -----------------------------
+def train_model_ranker(
+    user_id: str = "user_1",
+    features_path: Optional[str] = None,
+    save_model: bool = True,
+    model_params: Optional[dict] = None,
+    val_ratio: float = 0.2,
+    random_state: int = 42,
+    max_rows: Optional[int] = None,
+):
+    """
+    Train an XGBoost Learning-to-Rank model (XGBRanker) on cold-start generated data.
+    Expected input CSV (from cold_start.py):
+      - qid:       query id (one round of pantry sampling = one query)
+      - relevance: graded relevance label (e.g., 3/2/1/0)
+      - features:  numeric columns produced by build_features (and any extra numeric signals)
+    The function:
+      1) Reads the CSV
+      2) Selects numeric feature columns robustly
+      3) Splits train/val by qid to avoid leakage
+      4) Sorts each split by qid and builds group sizes aligned to sample order
+      5) Trains XGBRanker and reports NDCG@5/10
+      6) Saves model to user_data/<user_id>/ranker.pkl
+    """
+    base_dir = os.path.join("user_data", user_id)
+    os.makedirs(base_dir, exist_ok=True)
+    # Resolve features path
+    if features_path is None:
+        features_path = os.path.join(base_dir, "user_features_rank.csv")
+    if not os.path.exists(features_path):
+        raise FileNotFoundError(
+            f"[train_model_ranker] Cold-start features not found at: {features_path}\n"
+            f"Please run cold_start_ranker(user_id='{user_id}') first."
+        )
+    # Load data
+    df = pd.read_csv(features_path)
+    if max_rows is not None and len(df) > max_rows:
+        df = df.sample(max_rows, random_state=random_state).reset_index(drop=True)
+    # Basic validation
+    if "qid" not in df.columns or "relevance" not in df.columns:
+        raise ValueError("Input CSV must contain 'qid' and 'relevance' columns.")
+    # Fill NaNs in label/qid (should not happen, but defensive)
+    df["qid"] = pd.to_numeric(df["qid"], errors="coerce").fillna(-1).astype(int)
+    df["relevance"] = pd.to_numeric(df["relevance"], errors="coerce").fillna(0).astype(float)
+    # Pick numeric feature columns robustly
+    drop_cols = {"qid", "relevance"}
+    feature_cols = _pick_feature_cols(df, drop_cols)
+    if not feature_cols:
+        raise ValueError("No numeric feature columns found in dataset.")
+    # Ensure numeric + finite values only (replace inf/nan with 0)
+    df[feature_cols] = df[feature_cols].apply(pd.to_numeric, errors="coerce")
+    df[feature_cols] = df[feature_cols].replace([np.inf, -np.inf], np.nan).fillna(0.0)
+    # Split by qid to avoid leakage across queries
+    unique_qids = df["qid"].unique()
+    if len(unique_qids) < 2:
+        warnings.warn("Only one unique qid found — ranking training may be ineffective.")
+    train_qids, val_qids = train_test_split(
+        unique_qids, test_size=val_ratio, random_state=random_state
+    )
+    train_mask = df["qid"].isin(train_qids)
+    val_mask = df["qid"].isin(val_qids)
+    # Split dataframes
+    X_train_raw = df.loc[train_mask, feature_cols]
+    y_train_raw = df.loc[train_mask, "relevance"]
+    qid_train = df.loc[train_mask, "qid"]
+    X_val_raw = df.loc[val_mask, feature_cols]
+    y_val_raw = df.loc[val_mask, "relevance"]
+    qid_val = df.loc[val_mask, "qid"]
+    # Sort by qid and build group sizes aligned with sample order (CRITICAL for XGBRanker)
+    X_train, y_train, group_train, _ = _sort_and_pack_by_qid(
+    X_train_raw, y_train_raw, qid_train, feature_cols
+    )
+    X_val, y_val, group_val, qid_val_sorted = _sort_and_pack_by_qid(
+        X_val_raw, y_val_raw, qid_val, feature_cols
+    )
+    print(f"[ranker] #Train groups: {len(group_train)} | #Val groups: {len(group_val)}")
+    print(f"[ranker] Train rows: {len(X_train)} | Val rows: {len(X_val)} | #Features: {len(feature_cols)}")
+    # Default model params
+    default_params = dict(
+        objective="rank:ndcg",
+        eval_metric="ndcg",
+        n_estimators=400,
+        learning_rate=0.08,
+        max_depth=6,
+        subsample=0.8,
+        colsample_bytree=0.8,
+        random_state=random_state,
+        tree_method="hist",
+        reg_lambda=1.0,
+        reg_alpha=0.0,
+    )
+    if model_params:
+        default_params.update(model_params)
+    model = XGBRanker(**default_params)
+    # Fit model (XGBRanker requires group/group for eval_set as well)
+    fit_kwargs = dict(
+    X=X_train,
+    y=y_train,
+    group=group_train,
+    eval_set=[(X_val, y_val)],
+    eval_group=[group_val],
+    verbose=False,
+)
+    try:
+        # Newer xgboost versions (some builds) support early_stopping_rounds on Ranker
+        model.fit(early_stopping_rounds=50, **fit_kwargs)  # maximize=True is inferred by 'ndcg'
+    except TypeError:
+        # Fallback to callback API (older versions)
+        try:
+            from xgboost.callback import EarlyStopping
+            model.fit(callbacks=[EarlyStopping(rounds=50, save_best=True, maximize=True)], **fit_kwargs)
+        except Exception:
+            # Last resort: train without early stopping
+            model.fit(**fit_kwargs)
+    # Evaluate mean NDCG@5/10
+    metrics = _eval_mean_ndcg(model, X_val, y_val, qid_val_sorted, ks=(5, 10))
+    print("[ranker] Validation metrics:", " ".join(f"{k}={v:.4f}" for k, v in metrics.items()))
+    # Save model
+    if save_model:
+        model_path = os.path.join(base_dir, "ranker.pkl")
+        joblib.dump(model, model_path)
+        print(f"[ranker] Model saved to {model_path}")
+    return model, metrics, feature_cols
+if __name__ == "__main__":
+    # Example run
+    train_model_ranker(
+        user_id="user_1",
+        save_model=True,
+        val_ratio=0.2,
+        random_state=42,
+        max_rows=None,  # or set an upper bound for quick iterations, e.g., 200_000
+        model_params=None,  # override defaults if desired
+    )

recipe_recommendation/user_data/demo_user_1/user_profile.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "user_id": "demo_user_1",
+  "num_feedback": 0,
+  "diet": {
+    "vegetarian_type": "flexible"
+  },
+  "allergies": [],
+  "region_preference": [
+    "North America"
+  ],
+  "nutritional_goals": {
+    "calories": {
+      "min": 200,
+      "max": 800
+    },
+    "protein": {
+      "min": 20,
+      "max": 100
+    }
+  },
+  "other_preferences": {
+    "preferred_main": [
+      "chicken"
+    ],
+    "disliked_main": [],
+    "cooking_time_max": 30
+  }
+}

recipe_recommendation/user_data/user_0/feature_order.json ADDED Viewed

	@@ -0,0 +1,22 @@

+[
+  "main_match_ratio",
+  "other_match_ratio",
+  "staple_match_ratio",
+  "missing_main_count",
+  "missing_other_count",
+  "missing_staple_count",
+  "calories",
+  "protein",
+  "fat",
+  "protein_ratio",
+  "fat_ratio",
+  "region_match",
+  "is_vegan_safe",
+  "is_vegetarian_safe_absolute",
+  "is_flexible_safe_absolute",
+  "is_user_diet_safe",
+  "low_calorie_penalty",
+  "preferred_main_overlap",
+  "preferred_course_overlap",
+  "within_cooking_time"
+]

recipe_recommendation/user_data/user_0/feedback.csv ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ main_match_ratio,other_match_ratio,staple_match_ratio,missing_main_count,missing_other_count,missing_staple_count,calories,protein,fat,protein_ratio,fat_ratio,region_match,is_vegan_safe,is_vegetarian_safe_absolute,is_flexible_safe_absolute,is_user_diet_safe,low_calorie_penalty,preferred_main_overlap,preferred_course_overlap,within_cooking_time,recipe_id,qid,relevance
2	+ 0.0,0.0,0.0,1,3,1,123.9,0,0,0.0,0.0,0,0,0,0,1,1,0,0,1,73148,0,5

recipe_recommendation/user_data/user_0/qid.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 0

recipe_recommendation/user_data/user_0/ranker.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:72a3361c05b69d3627a69983ee1460730b304b1a4c562be6fc75001ef9bd887f
+size 1598006

recipe_recommendation/user_data/user_0/user_features_rank.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

recipe_recommendation/user_data/user_0/user_profile.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "user_id": "user_0",
+  "num_feedback": 0,
+  "diet": {
+    "vegetarian_type": "non_vegetarian"
+  },
+  "allergies": [],
+  "region_preference": [
+    "Asia"
+  ],
+  "nutritional_goals": {
+    "calories": {
+      "min": 250,
+      "max": 4000
+    },
+    "protein": {
+      "min": 20,
+      "max": 160
+    }
+  },
+  "other_preferences": {
+    "preferred_main": [],
+    "disliked_main": [],
+    "cooking_time_max": 180
+  }
+}

recipe_recommendation/user_data/user_1/feature_order.json ADDED Viewed

	@@ -0,0 +1,22 @@

+[
+  "main_match_ratio",
+  "other_match_ratio",
+  "staple_match_ratio",
+  "missing_main_count",
+  "missing_other_count",
+  "missing_staple_count",
+  "calories",
+  "protein",
+  "fat",
+  "protein_ratio",
+  "fat_ratio",
+  "region_match",
+  "is_vegan_safe",
+  "is_vegetarian_safe_absolute",
+  "is_flexible_safe_absolute",
+  "is_user_diet_safe",
+  "low_calorie_penalty",
+  "preferred_main_overlap",
+  "preferred_course_overlap",
+  "within_cooking_time"
+]

recipe_recommendation/user_data/user_1/feedback.csv ADDED Viewed

	@@ -0,0 +1,3 @@

+main_match_ratio,other_match_ratio,staple_match_ratio,missing_main_count,missing_other_count,missing_staple_count,calories,protein,fat,protein_ratio,fat_ratio,region_match,is_vegan_safe,is_vegetarian_safe_absolute,is_flexible_safe_absolute,is_user_diet_safe,low_calorie_penalty,preferred_main_overlap,preferred_course_overlap,within_cooking_time,recipe_id,qid,relevance
+0.0,0.0,0.0,1,3,1,320.2,0,0,0.0,0.0,0,0,0,0,0,1,1,0,1,44939,0,5
+0.0,0.0,0.0,1,3,1,123.9,0,0,0.0,0.0,0,0,0,0,0,1,0,0,1,73148,1,5

recipe_recommendation/user_data/user_1/qid.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 2

recipe_recommendation/user_data/user_1/ranker.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c8f305ad668b45ca0bb8d6f6cb1b87ca68d26a5c495622d2df4ac38e546b2787
+size 1638981

recipe_recommendation/user_data/user_1/user_features_rank.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

recipe_recommendation/user_data/user_1/user_profile.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "user_id": "user_1",
+  "num_feedback": 0,
+  "diet": {
+    "vegetarian_type": "flexible"
+  },
+  "allergies": [],
+  "region_preference": [
+    "North America"
+  ],
+  "nutritional_goals": {
+    "calories": {
+      "min": 250,
+      "max": 2000
+    },
+    "protein": {
+      "min": 50,
+      "max": 160
+    }
+  },
+  "other_preferences": {
+    "preferred_main": [],
+    "disliked_main": [],
+    "cooking_time_max": 45
+  }
+}

recipe_recommendation/user_data/user_2/feature_order.json ADDED Viewed

	@@ -0,0 +1,22 @@

+[
+  "main_match_ratio",
+  "other_match_ratio",
+  "staple_match_ratio",
+  "missing_main_count",
+  "missing_other_count",
+  "missing_staple_count",
+  "calories",
+  "protein",
+  "fat",
+  "protein_ratio",
+  "fat_ratio",
+  "region_match",
+  "is_vegan_safe",
+  "is_vegetarian_safe_absolute",
+  "is_flexible_safe_absolute",
+  "is_user_diet_safe",
+  "low_calorie_penalty",
+  "preferred_main_overlap",
+  "preferred_course_overlap",
+  "within_cooking_time"
+]

recipe_recommendation/user_data/user_2/feedback.csv ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ main_match_ratio,other_match_ratio,staple_match_ratio,missing_main_count,missing_other_count,missing_staple_count,calories,protein,fat,protein_ratio,fat_ratio,region_match,is_vegan_safe,is_vegetarian_safe_absolute,is_flexible_safe_absolute,is_user_diet_safe,low_calorie_penalty,preferred_main_overlap,preferred_course_overlap,within_cooking_time,recipe_id,qid,relevance
2	+ 0.0,0.0,0.0,1,2,1,1640.1,0,0,0.0,0.0,0,0,0,0,1,1,0,0,1,106901,0,5