File size: 7,033 Bytes
26d29db 78b9457 26d29db |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
---
language: en
license: mit
tags:
- recommendation
- ranking
- personalization
- xgboost
- xgbranker
- recipe
- cold-start
datasets:
- your-username/recipe-cleaned-dataset
model-index:
- name: Personalized Recipe Ranking Models
results:
- task:
type: recommendation
name: Personalized Recipe Ranking
dataset:
name: Food.com (Cleaned)
type: your-username/recipe-cleaned-dataset
metrics:
- type: ndcg@5
value: 0.44
- type: ndcg@10
value: 0.44
---
# Model Card: Personalized Recipe Ranking Models
## Overview
This project implements a personalized recipe recommendation system using two model categories:
1. **Scratch-trained baseline**: A simple rule-based + embedding matching ranker trained on a synthetic preference dataset (no user-specific rules).
2. **Rule-enhanced cold-start models**: Five separate XGBRanker models trained with more complex rule-based preference signals and user-specific interaction patterns (user1–user5).
The goal is to evaluate how different user profiles affect ranking behavior and recommendation diversity, even when overall NDCG scores are lower than the baseline.
---
## Model Category 1: Scratch-trained Baseline
### Purpose
Provide a simple cold-start recommendation baseline that matches ingredients and ranks recipes without personalization. It uses parent–child ingredient overlap and a few numeric features (e.g., protein, cost, cooking time).
### Data Sources
- Cleaned Food.com dataset (~180k recipes)
- 10,000 synthetic preference samples generated via uniform random selection
### Training Details
- Model type: **XGBRanker** (`objective='rank:pairwise'`)
- Features: ~1000 numeric ingredient-parent ratio features + basic nutrition/time features
- Train/test split: 80/20 (by recipe ID)
- Evaluation metric: NDCG@5, NDCG@10
### Evaluation
The baseline achieves **very high NDCG scores (95%+)**, because training and evaluation rely on synthetic signals that align perfectly with the ranking structure.
### Intended Use
Serve as a **sanity check** and upper bound for ranking performance, not for deployment.
### Limitations
- Unrealistically clean preference structure
- No user differentiation
- Inflated metrics due to synthetic evaluation
---
## Model Category 2: Rule-enhanced Cold Start Models (User1–User5)
### Purpose
Capture user-specific dietary preferences and ranking heuristics using richer rule sets, leading to more diverse recommendation patterns across different users.
### Data Sources
- Cleaned Food.com dataset (~180k recipes)
- 5,000 cold-start synthetic interactions per user profile
- Additional unselected (negative) samples included to simulate realistic cold-start scenarios
### Model
- Model type: **XGBRanker** (scratch-trained)
- Training objective: `rank:pairwise`
- Feature space:
- Ingredient-parent coverage ratios (~1000 parent nodes)
- Nutrition features: protein, calories, cost, cooking time
- User preference weights: protein/time/cost
- Dietary tag filters and exclusion rules
### Training Setup
- Train/valid/test split: 70/15/15 by recipe ID per profile
- No fine-tuning between profiles; each profile trained independently
- Evaluation metric: NDCG@5 and NDCG@10
### Evaluation Results
| User Profile | NDCG@5 | NDCG@10 |
|-------------|--------|---------|
| user1 | 0.4400 | 0.4400 |
| user2 | 0.4342 | 0.4342 |
| user3 | 0.4179 | 0.4179 |
| user4 | 0.1651 | 0.1651 |
| user5 | 0.4607 | 0.4607 |
**Note:** User4 has very restrictive dietary preferences, resulting in very few matching recipes and inherently lower achievable NDCG.
:contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}:contentReference[oaicite:2]{index=2}:contentReference[oaicite:3]{index=3}:contentReference[oaicite:4]{index=4}
Although these NDCG values are lower than the baseline, this is expected for several reasons:
- The cold-start datasets contain a large proportion of unselected recipes, leading to sparse positive signals.
- More complex preference rules increase variability and reduce alignment with NDCG’s single-label relevance assumptions.
- The models now produce more differentiated ranking behaviors across user profiles, which aligns with the intended personalization goals.
---
## Model Selection Justification
- **XGBRanker** was chosen for all models due to its effectiveness on structured tabular data, fast training time, and compatibility with large feature spaces (1000+ ingredients).
- The **baseline model** acts as a clean control, providing an upper bound on achievable NDCG under idealized preferences.
- The **rule-enhanced models** trade some raw NDCG performance for greater personalization fidelity, which is critical in multi-user recommendation contexts.
---
## Evaluation Methodology
- Metric: NDCG@5 and NDCG@10 on held-out cold-start samples
- Each user model evaluated independently
- Negative samples retained to approximate real-world recommendation class imbalance
---
## Intended Uses and Limitations
**Intended Uses**
- Multi-profile recipe recommendation
- Studying personalization behaviors under sparse feedback
- Cold-start scenarios for new users
**Limitations**
- Synthetic user interactions do not perfectly reflect real-world feedback
- NDCG is not well aligned with multi-rule personalization behavior
- User4 performance is limited by scarcity of relevant recipes
---
## Risks and Bias
The models are trained on the Food.com dataset, which has known biases:
- **Regional bias**: Western and American cuisines dominate the dataset, leading to potential under-representation of other regions.
- **Popularity bias**: Highly rated or frequently interacted recipes are over-represented.
- **Cold-start leakage risk**: Although user interactions are synthetic, overlapping ingredient-parent structures between train/test may create mild information leakage, potentially inflating baseline metrics.
These biases may affect recommendation diversity and fairness across different cuisines or dietary groups.
---
## Cost and Latency
All models are based on **XGBRanker**, which runs efficiently on CPU:
- **Inference latency**: Approximately 1–5 ms per recipe for ranking (measured on a laptop CPU, single thread).
- **Training cost**: Training each user profile model on 5,000 interactions takes less than 2 minutes on CPU.
The approach is designed for real-time personalization in lightweight interfaces (e.g., Hugging Face Spaces).
---
## Usage Disclosure
**Intended Uses**
- Academic and educational research on personalized recommendation
- Cold-start personalization experiments
- Recipe recommendation for diverse dietary profiles
**Not Intended For**
- Medical or dietary decision-making
- Real-world deployment without additional bias mitigation
- High-stakes personalization where fairness across demographic groups is critical
---
## Citation
Tang, Xinxuan. Personalized Recipe Ranking Models. 2025.
|