File size: 5,045 Bytes
84306b7
 
 
3cd9598
 
 
 
 
 
 
 
 
84306b7
 
3cd9598
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba1edc7
3cd9598
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77c9e76
3cd9598
 
 
94c096a
 
 
 
 
 
 
 
 
 
 
3cd9598
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d80b36
3cd9598
 
 
 
 
 
 
 
 
 
 
8851a6a
9d80b36
8fd6a48
3cd9598
 
 
 
 
 
ba1edc7
3cd9598
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
language:
- en
license: mit
library_name: xgboost
pipeline_tag: text-classification
tags:
- xgboost
- multiclass
- cuisine
- region-classification
- kaggle
metrics:
- accuracy
- f1
model-index:
- name: CuisineClassifier
  results:
  - task:
      type: text-classification
      name: Cuisine (20 classes)
    dataset:
      name: What's Cooking? (Kaggle)
      type: whats-
      url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
      split: test
    metrics:
    - type: accuracy
      value: 0.77
    - type: f1
      value: 0.69
  - task:
      type: text-classification
      name: Region (5 classes)
    dataset:
      name: What's Cooking? (Kaggle)  aggregated to regions
      type: whats-cooking
      url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
      split: test
    metrics:
    - type: accuracy
      value: 0.89
---

# 🍽 Cuisine Classifier (XGBoost)
This model classifies dishes based on their ingredients and assigns them either to a **Cuisine (20 classes)** or a **Region (5 classes)**.  
It uses an **XGBoost classifier** trained on normalized ingredient data.

---

## 📊 Model Overview

- **Task**: Multiclass Classification (Cuisines & Regions)  
- **Input**: List of ingredients (`["salt", "flour", "sugar", ...]`)  
- **Output**: Cuisine class (e.g. `"italian"`) or Region (e.g. `"Central Europe"`)  
- **Algorithm**: [XGBoost](https://xgboost.ai/)  
- **Training Data**: Kaggle [*What’s Cooking?*](https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset) dataset, ingredients normalized using AllRecipes dataset  
- **Train/Test Split**: 80 / 20, stratified  
- **Cross Validation**: 5-fold CV with `random_state=42`

### 🌍 Region Mapping
| Region          | Cuisines                                                   |
|-----------------|-----------------------------------------------------------|
| Central Europe  | british, french, greek, irish, italian, russian, spanish  |
| North America   | cajun_creole, southern_us                                 |
| Asia            | chinese, filipino, indian, japanese, korean, thai, vietnamese |
| Middle East     | moroccan                                                  |
| Latin America   | mexican, jamaican, brazilian                              |



---

## 🧪 Performance

### Model Comparison

| Metric | Stratified Baseline | Logistic Regression | XGBoost |
|-------|----------------------|---------------------|---------|
| **Precision (20 cuisines)** | 0.05 | 0.65 | **0.75** |
| **Recall (20 cuisines)**    | 0.05 | **0.69** | 0.66 |
| **Macro F1 (20 cuisines)**  | 0.05 | 0.67 | **0.69** |
| **Accuracy (20 cuisines)**  | 0.10 | 0.75 | **0.77** |
| **Accuracy (5 regions)**    | 0.27 | **0.89** | **0.89** |

✅ **Conclusion:**  
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.  
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.

---

### Per-Region Metrics (5 Classes)

| Region          | Precision (XGB) | Recall (XGB) | F1 (XGB) |
|-----------------|------------------|--------------|----------|
| Asia           | 0.94 | 0.92 | 0.93 |
| Central Europe | 0.85 | **0.93** | 0.89 |
| Latin America  | 0.92 | 0.88 | 0.90 |
| Middle East    | **0.88** | 0.74 | 0.81 |
| North America  | **0.87** | 0.76 | 0.81 |

---

## 🚀 How to Use

```python
from huggingface_hub import hf_hub_download
import joblib

class CuisineClassifier:

    def __init__(self, classifier="region"):
        print("Initializing CuisineClassifier...")

        components = ["cuisine_pipeline", "label_encoder"]
        paths = {}

        print("Downloading files from Hugging Face Hub...")
        for name in components:
            print(f"Downloading {name}.joblib ...")
            try:
                paths[name] = hf_hub_download(
                    repo_id="NoahMeissner/CuisineClassifier", 
                    filename=f"region_classifier/{name}.joblib"
                    if classifier == "cuisine":
                      filename=f"cuisine_classifier/{name}.joblib"
                )
                print(f"{name} downloaded.")
            except Exception as e:
                print(f"Failed to download {name}: {e}")
                raise

        print("Loading model components with joblib...")
        try:
            self.model = joblib.load(paths["cuisine_pipeline"])
            print("Model loaded.")
            self.label_encoder = joblib.load(paths["label_encoder"])
            print("Label encoder loaded.")
        except Exception as e:
            print(f"Failed to load components: {e}")
            raise

        print("All components loaded successfully.")

    def classify(self, text_input):
        data = " ".join(text_input)
        predicted_class = self.model.predict([data])
        predicted_label = self.label_encoder.inverse_transform(predicted_class)
        return predicted_label