Update README.md

78b9457 verified 3 months ago

7.03 kB

	---
	language: en
	license: mit
	tags:
	- recommendation
	- ranking
	- personalization
	- xgboost
	- xgbranker
	- recipe
	- cold-start
	datasets:
	- your-username/recipe-cleaned-dataset
	model-index:
	- name: Personalized Recipe Ranking Models
	results:
	- task:
	type: recommendation
	name: Personalized Recipe Ranking
	dataset:
	name: Food.com (Cleaned)
	type: your-username/recipe-cleaned-dataset
	metrics:
	- type: ndcg@5
	value: 0.44
	- type: ndcg@10
	value: 0.44
	---

	# Model Card: Personalized Recipe Ranking Models

	## Overview

	This project implements a personalized recipe recommendation system using two model categories:

	1. Scratch-trained baseline: A simple rule-based + embedding matching ranker trained on a synthetic preference dataset (no user-specific rules).
	2. Rule-enhanced cold-start models: Five separate XGBRanker models trained with more complex rule-based preference signals and user-specific interaction patterns (user1–user5).

	The goal is to evaluate how different user profiles affect ranking behavior and recommendation diversity, even when overall NDCG scores are lower than the baseline.

	---

	## Model Category 1: Scratch-trained Baseline

	### Purpose
	Provide a simple cold-start recommendation baseline that matches ingredients and ranks recipes without personalization. It uses parent–child ingredient overlap and a few numeric features (e.g., protein, cost, cooking time).

	### Data Sources
	- Cleaned Food.com dataset (~180k recipes)
	- 10,000 synthetic preference samples generated via uniform random selection

	### Training Details
	- Model type: XGBRanker (`objective='rank:pairwise'`)
	- Features: ~1000 numeric ingredient-parent ratio features + basic nutrition/time features
	- Train/test split: 80/20 (by recipe ID)
	- Evaluation metric: NDCG@5, NDCG@10

	### Evaluation
	The baseline achieves very high NDCG scores (95%+), because training and evaluation rely on synthetic signals that align perfectly with the ranking structure.

	### Intended Use
	Serve as a sanity check and upper bound for ranking performance, not for deployment.

	### Limitations
	- Unrealistically clean preference structure
	- No user differentiation
	- Inflated metrics due to synthetic evaluation

	---

	## Model Category 2: Rule-enhanced Cold Start Models (User1–User5)

	### Purpose
	Capture user-specific dietary preferences and ranking heuristics using richer rule sets, leading to more diverse recommendation patterns across different users.

	### Data Sources
	- Cleaned Food.com dataset (~180k recipes)
	- 5,000 cold-start synthetic interactions per user profile
	- Additional unselected (negative) samples included to simulate realistic cold-start scenarios

	### Model
	- Model type: XGBRanker (scratch-trained)
	- Training objective: `rank:pairwise`
	- Feature space:
	- Ingredient-parent coverage ratios (~1000 parent nodes)
	- Nutrition features: protein, calories, cost, cooking time
	- User preference weights: protein/time/cost
	- Dietary tag filters and exclusion rules

	### Training Setup
	- Train/valid/test split: 70/15/15 by recipe ID per profile
	- No fine-tuning between profiles; each profile trained independently
	- Evaluation metric: NDCG@5 and NDCG@10

	### Evaluation Results

	\| User Profile \| NDCG@5 \| NDCG@10 \|
	\|-------------\|--------\|---------\|
	\| user1 \| 0.4400 \| 0.4400 \|
	\| user2 \| 0.4342 \| 0.4342 \|
	\| user3 \| 0.4179 \| 0.4179 \|
	\| user4 \| 0.1651 \| 0.1651 \|
	\| user5 \| 0.4607 \| 0.4607 \|

	Note: User4 has very restrictive dietary preferences, resulting in very few matching recipes and inherently lower achievable NDCG.

	:contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}:contentReference[oaicite:2]{index=2}:contentReference[oaicite:3]{index=3}:contentReference[oaicite:4]{index=4}

	Although these NDCG values are lower than the baseline, this is expected for several reasons:

	- The cold-start datasets contain a large proportion of unselected recipes, leading to sparse positive signals.
	- More complex preference rules increase variability and reduce alignment with NDCG’s single-label relevance assumptions.
	- The models now produce more differentiated ranking behaviors across user profiles, which aligns with the intended personalization goals.

	---

	## Model Selection Justification

	- XGBRanker was chosen for all models due to its effectiveness on structured tabular data, fast training time, and compatibility with large feature spaces (1000+ ingredients).
	- The baseline model acts as a clean control, providing an upper bound on achievable NDCG under idealized preferences.
	- The rule-enhanced models trade some raw NDCG performance for greater personalization fidelity, which is critical in multi-user recommendation contexts.

	---

	## Evaluation Methodology

	- Metric: NDCG@5 and NDCG@10 on held-out cold-start samples
	- Each user model evaluated independently
	- Negative samples retained to approximate real-world recommendation class imbalance

	---

	## Intended Uses and Limitations

	Intended Uses
	- Multi-profile recipe recommendation
	- Studying personalization behaviors under sparse feedback
	- Cold-start scenarios for new users

	Limitations
	- Synthetic user interactions do not perfectly reflect real-world feedback
	- NDCG is not well aligned with multi-rule personalization behavior
	- User4 performance is limited by scarcity of relevant recipes

	---
	## Risks and Bias

	The models are trained on the Food.com dataset, which has known biases:
	- Regional bias: Western and American cuisines dominate the dataset, leading to potential under-representation of other regions.
	- Popularity bias: Highly rated or frequently interacted recipes are over-represented.
	- Cold-start leakage risk: Although user interactions are synthetic, overlapping ingredient-parent structures between train/test may create mild information leakage, potentially inflating baseline metrics.

	These biases may affect recommendation diversity and fairness across different cuisines or dietary groups.

	---

	## Cost and Latency

	All models are based on XGBRanker, which runs efficiently on CPU:
	- Inference latency: Approximately 1–5 ms per recipe for ranking (measured on a laptop CPU, single thread).
	- Training cost: Training each user profile model on 5,000 interactions takes less than 2 minutes on CPU.

	The approach is designed for real-time personalization in lightweight interfaces (e.g., Hugging Face Spaces).

	---

	## Usage Disclosure

	Intended Uses
	- Academic and educational research on personalized recommendation
	- Cold-start personalization experiments
	- Recipe recommendation for diverse dietary profiles

	Not Intended For
	- Medical or dietary decision-making
	- Real-world deployment without additional bias mitigation
	- High-stakes personalization where fairness across demographic groups is critical

	---

	## Citation

	Tang, Xinxuan. Personalized Recipe Ranking Models. 2025.