Iris314 commited on
Commit
26d29db
·
verified ·
1 Parent(s): 87effb8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +149 -0
README.md CHANGED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - recommendation
6
+ - ranking
7
+ - personalization
8
+ - xgboost
9
+ - xgbranker
10
+ - recipe
11
+ - cold-start
12
+ datasets:
13
+ - your-username/recipe-cleaned-dataset
14
+ model-index:
15
+ - name: Personalized Recipe Ranking Models
16
+ results:
17
+ - task:
18
+ type: recommendation
19
+ name: Personalized Recipe Ranking
20
+ dataset:
21
+ name: Food.com (Cleaned)
22
+ type: your-username/recipe-cleaned-dataset
23
+ metrics:
24
+ - type: ndcg@5
25
+ value: 0.44
26
+ - type: ndcg@10
27
+ value: 0.44
28
+ ---
29
+
30
+ # Model Card: Personalized Recipe Ranking Models
31
+
32
+ ## Overview
33
+
34
+ This project implements a personalized recipe recommendation system using two model categories:
35
+
36
+ 1. **Scratch-trained baseline**: A simple rule-based + embedding matching ranker trained on a synthetic preference dataset (no user-specific rules).
37
+ 2. **Rule-enhanced cold-start models**: Five separate XGBRanker models trained with more complex rule-based preference signals and user-specific interaction patterns (user1–user5).
38
+
39
+ The goal is to evaluate how different user profiles affect ranking behavior and recommendation diversity, even when overall NDCG scores are lower than the baseline.
40
+
41
+ ---
42
+
43
+ ## Model Category 1: Scratch-trained Baseline
44
+
45
+ ### Purpose
46
+ Provide a simple cold-start recommendation baseline that matches ingredients and ranks recipes without personalization. It uses parent–child ingredient overlap and a few numeric features (e.g., protein, cost, cooking time).
47
+
48
+ ### Data Sources
49
+ - Cleaned Food.com dataset (~180k recipes)
50
+ - 10,000 synthetic preference samples generated via uniform random selection
51
+
52
+ ### Training Details
53
+ - Model type: **XGBRanker** (`objective='rank:pairwise'`)
54
+ - Features: ~1000 numeric ingredient-parent ratio features + basic nutrition/time features
55
+ - Train/test split: 80/20 (by recipe ID)
56
+ - Evaluation metric: NDCG@5, NDCG@10
57
+
58
+ ### Evaluation
59
+ The baseline achieves **very high NDCG scores (95%+)**, because training and evaluation rely on synthetic signals that align perfectly with the ranking structure.
60
+
61
+ ### Intended Use
62
+ Serve as a **sanity check** and upper bound for ranking performance, not for deployment.
63
+
64
+ ### Limitations
65
+ - Unrealistically clean preference structure
66
+ - No user differentiation
67
+ - Inflated metrics due to synthetic evaluation
68
+
69
+ ---
70
+
71
+ ## Model Category 2: Rule-enhanced Cold Start Models (User1–User5)
72
+
73
+ ### Purpose
74
+ Capture user-specific dietary preferences and ranking heuristics using richer rule sets, leading to more diverse recommendation patterns across different users.
75
+
76
+ ### Data Sources
77
+ - Cleaned Food.com dataset (~180k recipes)
78
+ - 5,000 cold-start synthetic interactions per user profile
79
+ - Additional unselected (negative) samples included to simulate realistic cold-start scenarios
80
+
81
+ ### Model
82
+ - Model type: **XGBRanker** (scratch-trained)
83
+ - Training objective: `rank:pairwise`
84
+ - Feature space:
85
+ - Ingredient-parent coverage ratios (~1000 parent nodes)
86
+ - Nutrition features: protein, calories, cost, cooking time
87
+ - User preference weights: protein/time/cost
88
+ - Dietary tag filters and exclusion rules
89
+
90
+ ### Training Setup
91
+ - Train/valid/test split: 70/15/15 by recipe ID per profile
92
+ - No fine-tuning between profiles; each profile trained independently
93
+ - Evaluation metric: NDCG@5 and NDCG@10
94
+
95
+ ### Evaluation Results
96
+
97
+ | User Profile | NDCG@5 | NDCG@10 |
98
+ |-------------|--------|---------|
99
+ | user1 | 0.4400 | 0.4400 |
100
+ | user2 | 0.4342 | 0.4342 |
101
+ | user3 | 0.4179 | 0.4179 |
102
+ | user4 | 0.1651 | 0.1651 |
103
+ | user5 | 0.4607 | 0.4607 |
104
+
105
+ **Note:** User4 has very restrictive dietary preferences, resulting in very few matching recipes and inherently lower achievable NDCG.
106
+
107
+ :contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}:contentReference[oaicite:2]{index=2}:contentReference[oaicite:3]{index=3}:contentReference[oaicite:4]{index=4}
108
+
109
+ Although these NDCG values are lower than the baseline, this is expected for several reasons:
110
+
111
+ - The cold-start datasets contain a large proportion of unselected recipes, leading to sparse positive signals.
112
+ - More complex preference rules increase variability and reduce alignment with NDCG’s single-label relevance assumptions.
113
+ - The models now produce more differentiated ranking behaviors across user profiles, which aligns with the intended personalization goals.
114
+
115
+ ---
116
+
117
+ ## Model Selection Justification
118
+
119
+ - **XGBRanker** was chosen for all models due to its effectiveness on structured tabular data, fast training time, and compatibility with large feature spaces (1000+ ingredients).
120
+ - The **baseline model** acts as a clean control, providing an upper bound on achievable NDCG under idealized preferences.
121
+ - The **rule-enhanced models** trade some raw NDCG performance for greater personalization fidelity, which is critical in multi-user recommendation contexts.
122
+
123
+ ---
124
+
125
+ ## Evaluation Methodology
126
+
127
+ - Metric: NDCG@5 and NDCG@10 on held-out cold-start samples
128
+ - Each user model evaluated independently
129
+ - Negative samples retained to approximate real-world recommendation class imbalance
130
+
131
+ ---
132
+
133
+ ## Intended Uses and Limitations
134
+
135
+ **Intended Uses**
136
+ - Multi-profile recipe recommendation
137
+ - Studying personalization behaviors under sparse feedback
138
+ - Cold-start scenarios for new users
139
+
140
+ **Limitations**
141
+ - Synthetic user interactions do not perfectly reflect real-world feedback
142
+ - NDCG is not well aligned with multi-rule personalization behavior
143
+ - User4 performance is limited by scarcity of relevant recipes
144
+
145
+ ---
146
+
147
+ ## Citation
148
+
149
+ Tang, Xinxuan. Personalized Recipe Ranking Models. 2025.