Iris314's picture
Update README.md
78b9457 verified
---
language: en
license: mit
tags:
- recommendation
- ranking
- personalization
- xgboost
- xgbranker
- recipe
- cold-start
datasets:
- your-username/recipe-cleaned-dataset
model-index:
- name: Personalized Recipe Ranking Models
results:
- task:
type: recommendation
name: Personalized Recipe Ranking
dataset:
name: Food.com (Cleaned)
type: your-username/recipe-cleaned-dataset
metrics:
- type: ndcg@5
value: 0.44
- type: ndcg@10
value: 0.44
---
# Model Card: Personalized Recipe Ranking Models
## Overview
This project implements a personalized recipe recommendation system using two model categories:
1. **Scratch-trained baseline**: A simple rule-based + embedding matching ranker trained on a synthetic preference dataset (no user-specific rules).
2. **Rule-enhanced cold-start models**: Five separate XGBRanker models trained with more complex rule-based preference signals and user-specific interaction patterns (user1–user5).
The goal is to evaluate how different user profiles affect ranking behavior and recommendation diversity, even when overall NDCG scores are lower than the baseline.
---
## Model Category 1: Scratch-trained Baseline
### Purpose
Provide a simple cold-start recommendation baseline that matches ingredients and ranks recipes without personalization. It uses parent–child ingredient overlap and a few numeric features (e.g., protein, cost, cooking time).
### Data Sources
- Cleaned Food.com dataset (~180k recipes)
- 10,000 synthetic preference samples generated via uniform random selection
### Training Details
- Model type: **XGBRanker** (`objective='rank:pairwise'`)
- Features: ~1000 numeric ingredient-parent ratio features + basic nutrition/time features
- Train/test split: 80/20 (by recipe ID)
- Evaluation metric: NDCG@5, NDCG@10
### Evaluation
The baseline achieves **very high NDCG scores (95%+)**, because training and evaluation rely on synthetic signals that align perfectly with the ranking structure.
### Intended Use
Serve as a **sanity check** and upper bound for ranking performance, not for deployment.
### Limitations
- Unrealistically clean preference structure
- No user differentiation
- Inflated metrics due to synthetic evaluation
---
## Model Category 2: Rule-enhanced Cold Start Models (User1–User5)
### Purpose
Capture user-specific dietary preferences and ranking heuristics using richer rule sets, leading to more diverse recommendation patterns across different users.
### Data Sources
- Cleaned Food.com dataset (~180k recipes)
- 5,000 cold-start synthetic interactions per user profile
- Additional unselected (negative) samples included to simulate realistic cold-start scenarios
### Model
- Model type: **XGBRanker** (scratch-trained)
- Training objective: `rank:pairwise`
- Feature space:
- Ingredient-parent coverage ratios (~1000 parent nodes)
- Nutrition features: protein, calories, cost, cooking time
- User preference weights: protein/time/cost
- Dietary tag filters and exclusion rules
### Training Setup
- Train/valid/test split: 70/15/15 by recipe ID per profile
- No fine-tuning between profiles; each profile trained independently
- Evaluation metric: NDCG@5 and NDCG@10
### Evaluation Results
| User Profile | NDCG@5 | NDCG@10 |
|-------------|--------|---------|
| user1 | 0.4400 | 0.4400 |
| user2 | 0.4342 | 0.4342 |
| user3 | 0.4179 | 0.4179 |
| user4 | 0.1651 | 0.1651 |
| user5 | 0.4607 | 0.4607 |
**Note:** User4 has very restrictive dietary preferences, resulting in very few matching recipes and inherently lower achievable NDCG.
:contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}:contentReference[oaicite:2]{index=2}:contentReference[oaicite:3]{index=3}:contentReference[oaicite:4]{index=4}
Although these NDCG values are lower than the baseline, this is expected for several reasons:
- The cold-start datasets contain a large proportion of unselected recipes, leading to sparse positive signals.
- More complex preference rules increase variability and reduce alignment with NDCG’s single-label relevance assumptions.
- The models now produce more differentiated ranking behaviors across user profiles, which aligns with the intended personalization goals.
---
## Model Selection Justification
- **XGBRanker** was chosen for all models due to its effectiveness on structured tabular data, fast training time, and compatibility with large feature spaces (1000+ ingredients).
- The **baseline model** acts as a clean control, providing an upper bound on achievable NDCG under idealized preferences.
- The **rule-enhanced models** trade some raw NDCG performance for greater personalization fidelity, which is critical in multi-user recommendation contexts.
---
## Evaluation Methodology
- Metric: NDCG@5 and NDCG@10 on held-out cold-start samples
- Each user model evaluated independently
- Negative samples retained to approximate real-world recommendation class imbalance
---
## Intended Uses and Limitations
**Intended Uses**
- Multi-profile recipe recommendation
- Studying personalization behaviors under sparse feedback
- Cold-start scenarios for new users
**Limitations**
- Synthetic user interactions do not perfectly reflect real-world feedback
- NDCG is not well aligned with multi-rule personalization behavior
- User4 performance is limited by scarcity of relevant recipes
---
## Risks and Bias
The models are trained on the Food.com dataset, which has known biases:
- **Regional bias**: Western and American cuisines dominate the dataset, leading to potential under-representation of other regions.
- **Popularity bias**: Highly rated or frequently interacted recipes are over-represented.
- **Cold-start leakage risk**: Although user interactions are synthetic, overlapping ingredient-parent structures between train/test may create mild information leakage, potentially inflating baseline metrics.
These biases may affect recommendation diversity and fairness across different cuisines or dietary groups.
---
## Cost and Latency
All models are based on **XGBRanker**, which runs efficiently on CPU:
- **Inference latency**: Approximately 1–5 ms per recipe for ranking (measured on a laptop CPU, single thread).
- **Training cost**: Training each user profile model on 5,000 interactions takes less than 2 minutes on CPU.
The approach is designed for real-time personalization in lightweight interfaces (e.g., Hugging Face Spaces).
---
## Usage Disclosure
**Intended Uses**
- Academic and educational research on personalized recommendation
- Cold-start personalization experiments
- Recipe recommendation for diverse dietary profiles
**Not Intended For**
- Medical or dietary decision-making
- Real-world deployment without additional bias mitigation
- High-stakes personalization where fairness across demographic groups is critical
---
## Citation
Tang, Xinxuan. Personalized Recipe Ranking Models. 2025.