File size: 5,045 Bytes
84306b7 3cd9598 84306b7 3cd9598 ba1edc7 3cd9598 77c9e76 3cd9598 94c096a 3cd9598 9d80b36 3cd9598 8851a6a 9d80b36 8fd6a48 3cd9598 ba1edc7 3cd9598 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
---
language:
- en
license: mit
library_name: xgboost
pipeline_tag: text-classification
tags:
- xgboost
- multiclass
- cuisine
- region-classification
- kaggle
metrics:
- accuracy
- f1
model-index:
- name: CuisineClassifier
results:
- task:
type: text-classification
name: Cuisine (20 classes)
dataset:
name: What's Cooking? (Kaggle)
type: whats-
url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
split: test
metrics:
- type: accuracy
value: 0.77
- type: f1
value: 0.69
- task:
type: text-classification
name: Region (5 classes)
dataset:
name: What's Cooking? (Kaggle) — aggregated to regions
type: whats-cooking
url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
split: test
metrics:
- type: accuracy
value: 0.89
---
# 🍽 Cuisine Classifier (XGBoost)
This model classifies dishes based on their ingredients and assigns them either to a **Cuisine (20 classes)** or a **Region (5 classes)**.
It uses an **XGBoost classifier** trained on normalized ingredient data.
---
## 📊 Model Overview
- **Task**: Multiclass Classification (Cuisines & Regions)
- **Input**: List of ingredients (`["salt", "flour", "sugar", ...]`)
- **Output**: Cuisine class (e.g. `"italian"`) or Region (e.g. `"Central Europe"`)
- **Algorithm**: [XGBoost](https://xgboost.ai/)
- **Training Data**: Kaggle [*What’s Cooking?*](https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset) dataset, ingredients normalized using AllRecipes dataset
- **Train/Test Split**: 80 / 20, stratified
- **Cross Validation**: 5-fold CV with `random_state=42`
### 🌍 Region Mapping
| Region | Cuisines |
|-----------------|-----------------------------------------------------------|
| Central Europe | british, french, greek, irish, italian, russian, spanish |
| North America | cajun_creole, southern_us |
| Asia | chinese, filipino, indian, japanese, korean, thai, vietnamese |
| Middle East | moroccan |
| Latin America | mexican, jamaican, brazilian |
---
## 🧪 Performance
### Model Comparison
| Metric | Stratified Baseline | Logistic Regression | XGBoost |
|-------|----------------------|---------------------|---------|
| **Precision (20 cuisines)** | 0.05 | 0.65 | **0.75** |
| **Recall (20 cuisines)** | 0.05 | **0.69** | 0.66 |
| **Macro F1 (20 cuisines)** | 0.05 | 0.67 | **0.69** |
| **Accuracy (20 cuisines)** | 0.10 | 0.75 | **0.77** |
| **Accuracy (5 regions)** | 0.27 | **0.89** | **0.89** |
✅ **Conclusion:**
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.
---
### Per-Region Metrics (5 Classes)
| Region | Precision (XGB) | Recall (XGB) | F1 (XGB) |
|-----------------|------------------|--------------|----------|
| Asia | 0.94 | 0.92 | 0.93 |
| Central Europe | 0.85 | **0.93** | 0.89 |
| Latin America | 0.92 | 0.88 | 0.90 |
| Middle East | **0.88** | 0.74 | 0.81 |
| North America | **0.87** | 0.76 | 0.81 |
---
## 🚀 How to Use
```python
from huggingface_hub import hf_hub_download
import joblib
class CuisineClassifier:
def __init__(self, classifier="region"):
print("Initializing CuisineClassifier...")
components = ["cuisine_pipeline", "label_encoder"]
paths = {}
print("Downloading files from Hugging Face Hub...")
for name in components:
print(f"Downloading {name}.joblib ...")
try:
paths[name] = hf_hub_download(
repo_id="NoahMeissner/CuisineClassifier",
filename=f"region_classifier/{name}.joblib"
if classifier == "cuisine":
filename=f"cuisine_classifier/{name}.joblib"
)
print(f"{name} downloaded.")
except Exception as e:
print(f"Failed to download {name}: {e}")
raise
print("Loading model components with joblib...")
try:
self.model = joblib.load(paths["cuisine_pipeline"])
print("Model loaded.")
self.label_encoder = joblib.load(paths["label_encoder"])
print("Label encoder loaded.")
except Exception as e:
print(f"Failed to load components: {e}")
raise
print("All components loaded successfully.")
def classify(self, text_input):
data = " ".join(text_input)
predicted_class = self.model.predict([data])
predicted_label = self.label_encoder.inverse_transform(predicted_class)
return predicted_label |