|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- recommendation |
|
|
- ranking |
|
|
- personalization |
|
|
- xgboost |
|
|
- xgbranker |
|
|
- recipe |
|
|
- cold-start |
|
|
datasets: |
|
|
- your-username/recipe-cleaned-dataset |
|
|
model-index: |
|
|
- name: Personalized Recipe Ranking Models |
|
|
results: |
|
|
- task: |
|
|
type: recommendation |
|
|
name: Personalized Recipe Ranking |
|
|
dataset: |
|
|
name: Food.com (Cleaned) |
|
|
type: your-username/recipe-cleaned-dataset |
|
|
metrics: |
|
|
- type: ndcg@5 |
|
|
value: 0.44 |
|
|
- type: ndcg@10 |
|
|
value: 0.44 |
|
|
--- |
|
|
|
|
|
# Model Card: Personalized Recipe Ranking Models |
|
|
|
|
|
## Overview |
|
|
|
|
|
This project implements a personalized recipe recommendation system using two model categories: |
|
|
|
|
|
1. **Scratch-trained baseline**: A simple rule-based + embedding matching ranker trained on a synthetic preference dataset (no user-specific rules). |
|
|
2. **Rule-enhanced cold-start models**: Five separate XGBRanker models trained with more complex rule-based preference signals and user-specific interaction patterns (user1–user5). |
|
|
|
|
|
The goal is to evaluate how different user profiles affect ranking behavior and recommendation diversity, even when overall NDCG scores are lower than the baseline. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Category 1: Scratch-trained Baseline |
|
|
|
|
|
### Purpose |
|
|
Provide a simple cold-start recommendation baseline that matches ingredients and ranks recipes without personalization. It uses parent–child ingredient overlap and a few numeric features (e.g., protein, cost, cooking time). |
|
|
|
|
|
### Data Sources |
|
|
- Cleaned Food.com dataset (~180k recipes) |
|
|
- 10,000 synthetic preference samples generated via uniform random selection |
|
|
|
|
|
### Training Details |
|
|
- Model type: **XGBRanker** (`objective='rank:pairwise'`) |
|
|
- Features: ~1000 numeric ingredient-parent ratio features + basic nutrition/time features |
|
|
- Train/test split: 80/20 (by recipe ID) |
|
|
- Evaluation metric: NDCG@5, NDCG@10 |
|
|
|
|
|
### Evaluation |
|
|
The baseline achieves **very high NDCG scores (95%+)**, because training and evaluation rely on synthetic signals that align perfectly with the ranking structure. |
|
|
|
|
|
### Intended Use |
|
|
Serve as a **sanity check** and upper bound for ranking performance, not for deployment. |
|
|
|
|
|
### Limitations |
|
|
- Unrealistically clean preference structure |
|
|
- No user differentiation |
|
|
- Inflated metrics due to synthetic evaluation |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Category 2: Rule-enhanced Cold Start Models (User1–User5) |
|
|
|
|
|
### Purpose |
|
|
Capture user-specific dietary preferences and ranking heuristics using richer rule sets, leading to more diverse recommendation patterns across different users. |
|
|
|
|
|
### Data Sources |
|
|
- Cleaned Food.com dataset (~180k recipes) |
|
|
- 5,000 cold-start synthetic interactions per user profile |
|
|
- Additional unselected (negative) samples included to simulate realistic cold-start scenarios |
|
|
|
|
|
### Model |
|
|
- Model type: **XGBRanker** (scratch-trained) |
|
|
- Training objective: `rank:pairwise` |
|
|
- Feature space: |
|
|
- Ingredient-parent coverage ratios (~1000 parent nodes) |
|
|
- Nutrition features: protein, calories, cost, cooking time |
|
|
- User preference weights: protein/time/cost |
|
|
- Dietary tag filters and exclusion rules |
|
|
|
|
|
### Training Setup |
|
|
- Train/valid/test split: 70/15/15 by recipe ID per profile |
|
|
- No fine-tuning between profiles; each profile trained independently |
|
|
- Evaluation metric: NDCG@5 and NDCG@10 |
|
|
|
|
|
### Evaluation Results |
|
|
|
|
|
| User Profile | NDCG@5 | NDCG@10 | |
|
|
|-------------|--------|---------| |
|
|
| user1 | 0.4400 | 0.4400 | |
|
|
| user2 | 0.4342 | 0.4342 | |
|
|
| user3 | 0.4179 | 0.4179 | |
|
|
| user4 | 0.1651 | 0.1651 | |
|
|
| user5 | 0.4607 | 0.4607 | |
|
|
|
|
|
**Note:** User4 has very restrictive dietary preferences, resulting in very few matching recipes and inherently lower achievable NDCG. |
|
|
|
|
|
:contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}:contentReference[oaicite:2]{index=2}:contentReference[oaicite:3]{index=3}:contentReference[oaicite:4]{index=4} |
|
|
|
|
|
Although these NDCG values are lower than the baseline, this is expected for several reasons: |
|
|
|
|
|
- The cold-start datasets contain a large proportion of unselected recipes, leading to sparse positive signals. |
|
|
- More complex preference rules increase variability and reduce alignment with NDCG’s single-label relevance assumptions. |
|
|
- The models now produce more differentiated ranking behaviors across user profiles, which aligns with the intended personalization goals. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Selection Justification |
|
|
|
|
|
- **XGBRanker** was chosen for all models due to its effectiveness on structured tabular data, fast training time, and compatibility with large feature spaces (1000+ ingredients). |
|
|
- The **baseline model** acts as a clean control, providing an upper bound on achievable NDCG under idealized preferences. |
|
|
- The **rule-enhanced models** trade some raw NDCG performance for greater personalization fidelity, which is critical in multi-user recommendation contexts. |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation Methodology |
|
|
|
|
|
- Metric: NDCG@5 and NDCG@10 on held-out cold-start samples |
|
|
- Each user model evaluated independently |
|
|
- Negative samples retained to approximate real-world recommendation class imbalance |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Uses and Limitations |
|
|
|
|
|
**Intended Uses** |
|
|
- Multi-profile recipe recommendation |
|
|
- Studying personalization behaviors under sparse feedback |
|
|
- Cold-start scenarios for new users |
|
|
|
|
|
**Limitations** |
|
|
- Synthetic user interactions do not perfectly reflect real-world feedback |
|
|
- NDCG is not well aligned with multi-rule personalization behavior |
|
|
- User4 performance is limited by scarcity of relevant recipes |
|
|
|
|
|
--- |
|
|
## Risks and Bias |
|
|
|
|
|
The models are trained on the Food.com dataset, which has known biases: |
|
|
- **Regional bias**: Western and American cuisines dominate the dataset, leading to potential under-representation of other regions. |
|
|
- **Popularity bias**: Highly rated or frequently interacted recipes are over-represented. |
|
|
- **Cold-start leakage risk**: Although user interactions are synthetic, overlapping ingredient-parent structures between train/test may create mild information leakage, potentially inflating baseline metrics. |
|
|
|
|
|
These biases may affect recommendation diversity and fairness across different cuisines or dietary groups. |
|
|
|
|
|
--- |
|
|
|
|
|
## Cost and Latency |
|
|
|
|
|
All models are based on **XGBRanker**, which runs efficiently on CPU: |
|
|
- **Inference latency**: Approximately 1–5 ms per recipe for ranking (measured on a laptop CPU, single thread). |
|
|
- **Training cost**: Training each user profile model on 5,000 interactions takes less than 2 minutes on CPU. |
|
|
|
|
|
The approach is designed for real-time personalization in lightweight interfaces (e.g., Hugging Face Spaces). |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage Disclosure |
|
|
|
|
|
**Intended Uses** |
|
|
- Academic and educational research on personalized recommendation |
|
|
- Cold-start personalization experiments |
|
|
- Recipe recommendation for diverse dietary profiles |
|
|
|
|
|
**Not Intended For** |
|
|
- Medical or dietary decision-making |
|
|
- Real-world deployment without additional bias mitigation |
|
|
- High-stakes personalization where fairness across demographic groups is critical |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
Tang, Xinxuan. Personalized Recipe Ranking Models. 2025. |
|
|
|