NoahMeissner commited on
Commit
3cd9598
·
verified ·
1 Parent(s): ab600a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -3
README.md CHANGED
@@ -1,8 +1,135 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
 
 
 
 
 
 
 
 
 
5
  metrics:
6
- - f1
7
  - accuracy
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: mit
5
+ library_name: xgboost
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - xgboost
9
+ - multiclass
10
+ - cuisine
11
+ - region-classification
12
+ - kaggle
13
  metrics:
 
14
  - accuracy
15
+ - f1
16
+ model-index:
17
+ - name: CuisineClassifier
18
+ results:
19
+ - task:
20
+ type: text-classification
21
+ name: Cuisine (20 classes)
22
+ dataset:
23
+ name: What's Cooking? (Kaggle)
24
+ type: whats-
25
+ url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
26
+ split: test
27
+ metrics:
28
+ - type: accuracy
29
+ value: 0.77
30
+ - type: f1
31
+ value: 0.69
32
+ - task:
33
+ type: text-classification
34
+ name: Region (5 classes)
35
+ dataset:
36
+ name: What's Cooking? (Kaggle) — aggregated to regions
37
+ type: whats-cooking
38
+ split: test
39
+ metrics:
40
+ - type: accuracy
41
+ value: 0.89
42
+ ---
43
+
44
+ # 🍽 Cuisine Classifier (XGBoost)
45
+ This model classifies dishes based on their ingredients and assigns them either to a **Cuisine (20 classes)** or a **Region (5 classes)**.
46
+ It uses an **XGBoost classifier** trained on normalized ingredient data.
47
+
48
+ ---
49
+
50
+ ## 📊 Model Overview
51
+
52
+ - **Task**: Multiclass Classification (Cuisines & Regions)
53
+ - **Input**: List of ingredients (`["salt", "flour", "sugar", ...]`)
54
+ - **Output**: Cuisine class (e.g. `"italian"`) or Region (e.g. `"Central Europe"`)
55
+ - **Algorithm**: [XGBoost](https://xgboost.ai/)
56
+ - **Training Data**: Kaggle *What’s Cooking?* dataset, ingredients normalized using AllRecipes dataset
57
+ - **Train/Test Split**: 80 / 20, stratified
58
+ - **Cross Validation**: 5-fold CV with `random_state=42`
59
+
60
+ ---
61
+
62
+ ## 🧪 Performance
63
+
64
+ ### Model Comparison
65
+
66
+ | Metric | Stratified Baseline | Logistic Regression | XGBoost |
67
+ |-------|----------------------|---------------------|---------|
68
+ | **Precision (20 cuisines)** | 0.05 | 0.65 | **0.75** |
69
+ | **Recall (20 cuisines)** | 0.05 | **0.69** | 0.66 |
70
+ | **Macro F1 (20 cuisines)** | 0.05 | 0.67 | **0.69** |
71
+ | **Accuracy (20 cuisines)** | 0.10 | 0.75 | **0.77** |
72
+ | **Accuracy (5 regions)** | 0.27 | **0.89** | **0.89** |
73
+
74
+ ✅ **Conclusion:**
75
+ XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.
76
+ For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.
77
+
78
+ ---
79
+
80
+ ### Per-Region Metrics (5 Classes)
81
+
82
+ | Region | Precision (XGB) | Recall (XGB) | F1 (XGB) |
83
+ |-----------------|------------------|--------------|----------|
84
+ | Asia | 0.94 | 0.92 | 0.93 |
85
+ | Central Europe | 0.85 | **0.93** | 0.89 |
86
+ | Latin America | 0.92 | 0.88 | 0.90 |
87
+ | Middle East | **0.88** | 0.74 | 0.81 |
88
+ | North America | **0.87** | 0.76 | 0.81 |
89
+
90
+ ---
91
+
92
+ ## 🚀 How to Use
93
+
94
+ ```python
95
+ from huggingface_hub import hf_hub_download
96
+ import joblib
97
+
98
+ class CuisineClassifier:
99
+
100
+ def __init__(self, dataset=None):
101
+ print("Initializing CuisineClassifier...")
102
+
103
+ components = ["cuisine_pipeline", "label_encoder"]
104
+ paths = {}
105
+
106
+ print("Downloading files from Hugging Face Hub...")
107
+ for name in components:
108
+ print(f"Downloading {name}.joblib ...")
109
+ try:
110
+ paths[name] = hf_hub_download(
111
+ repo_id="NoahMeissner/CuisineClassifier",
112
+ filename=f"{name}.joblib"
113
+ )
114
+ print(f"{name} downloaded.")
115
+ except Exception as e:
116
+ print(f"Failed to download {name}: {e}")
117
+ raise
118
+
119
+ print("📦 Loading model components with joblib...")
120
+ try:
121
+ self.model = joblib.load(paths["cuisine_pipeline"])
122
+ print("Model loaded.")
123
+ self.label_encoder = joblib.load(paths["label_encoder"])
124
+ print("Label encoder loaded.")
125
+ except Exception as e:
126
+ print(f"Failed to load components: {e}")
127
+ raise
128
+
129
+ print("All components loaded successfully.")
130
+
131
+ def classify(self, text_input):
132
+ data = " ".join(text_input)
133
+ predicted_class = self.model.predict([data])
134
+ predicted_label = self.label_encoder.inverse_transform(predicted_class)
135
+ return predicted_label