|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
library_name: xgboost |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- xgboost |
|
|
- multiclass |
|
|
- cuisine |
|
|
- region-classification |
|
|
- kaggle |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
model-index: |
|
|
- name: CuisineClassifier |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Cuisine (20 classes) |
|
|
dataset: |
|
|
name: What's Cooking? (Kaggle) |
|
|
type: whats- |
|
|
url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset |
|
|
split: test |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.77 |
|
|
- type: f1 |
|
|
value: 0.69 |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Region (5 classes) |
|
|
dataset: |
|
|
name: What's Cooking? (Kaggle) — aggregated to regions |
|
|
type: whats-cooking |
|
|
url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset |
|
|
split: test |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.89 |
|
|
--- |
|
|
|
|
|
# 🍽 Cuisine Classifier (XGBoost) |
|
|
This model classifies dishes based on their ingredients and assigns them either to a **Cuisine (20 classes)** or a **Region (5 classes)**. |
|
|
It uses an **XGBoost classifier** trained on normalized ingredient data. |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Model Overview |
|
|
|
|
|
- **Task**: Multiclass Classification (Cuisines & Regions) |
|
|
- **Input**: List of ingredients (`["salt", "flour", "sugar", ...]`) |
|
|
- **Output**: Cuisine class (e.g. `"italian"`) or Region (e.g. `"Central Europe"`) |
|
|
- **Algorithm**: [XGBoost](https://xgboost.ai/) |
|
|
- **Training Data**: Kaggle [*What’s Cooking?*](https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset) dataset, ingredients normalized using AllRecipes dataset |
|
|
- **Train/Test Split**: 80 / 20, stratified |
|
|
- **Cross Validation**: 5-fold CV with `random_state=42` |
|
|
|
|
|
### 🌍 Region Mapping |
|
|
| Region | Cuisines | |
|
|
|-----------------|-----------------------------------------------------------| |
|
|
| Central Europe | british, french, greek, irish, italian, russian, spanish | |
|
|
| North America | cajun_creole, southern_us | |
|
|
| Asia | chinese, filipino, indian, japanese, korean, thai, vietnamese | |
|
|
| Middle East | moroccan | |
|
|
| Latin America | mexican, jamaican, brazilian | |
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## 🧪 Performance |
|
|
|
|
|
### Model Comparison |
|
|
|
|
|
| Metric | Stratified Baseline | Logistic Regression | XGBoost | |
|
|
|-------|----------------------|---------------------|---------| |
|
|
| **Precision (20 cuisines)** | 0.05 | 0.65 | **0.75** | |
|
|
| **Recall (20 cuisines)** | 0.05 | **0.69** | 0.66 | |
|
|
| **Macro F1 (20 cuisines)** | 0.05 | 0.67 | **0.69** | |
|
|
| **Accuracy (20 cuisines)** | 0.10 | 0.75 | **0.77** | |
|
|
| **Accuracy (5 regions)** | 0.27 | **0.89** | **0.89** | |
|
|
|
|
|
✅ **Conclusion:** |
|
|
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline. |
|
|
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes. |
|
|
|
|
|
--- |
|
|
|
|
|
### Per-Region Metrics (5 Classes) |
|
|
|
|
|
| Region | Precision (XGB) | Recall (XGB) | F1 (XGB) | |
|
|
|-----------------|------------------|--------------|----------| |
|
|
| Asia | 0.94 | 0.92 | 0.93 | |
|
|
| Central Europe | 0.85 | **0.93** | 0.89 | |
|
|
| Latin America | 0.92 | 0.88 | 0.90 | |
|
|
| Middle East | **0.88** | 0.74 | 0.81 | |
|
|
| North America | **0.87** | 0.76 | 0.81 | |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 How to Use |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
import joblib |
|
|
|
|
|
class CuisineClassifier: |
|
|
|
|
|
def __init__(self, classifier="region"): |
|
|
print("Initializing CuisineClassifier...") |
|
|
|
|
|
components = ["cuisine_pipeline", "label_encoder"] |
|
|
paths = {} |
|
|
|
|
|
print("Downloading files from Hugging Face Hub...") |
|
|
for name in components: |
|
|
print(f"Downloading {name}.joblib ...") |
|
|
try: |
|
|
paths[name] = hf_hub_download( |
|
|
repo_id="NoahMeissner/CuisineClassifier", |
|
|
filename=f"region_classifier/{name}.joblib" |
|
|
if classifier == "cuisine": |
|
|
filename=f"cuisine_classifier/{name}.joblib" |
|
|
) |
|
|
print(f"{name} downloaded.") |
|
|
except Exception as e: |
|
|
print(f"Failed to download {name}: {e}") |
|
|
raise |
|
|
|
|
|
print("Loading model components with joblib...") |
|
|
try: |
|
|
self.model = joblib.load(paths["cuisine_pipeline"]) |
|
|
print("Model loaded.") |
|
|
self.label_encoder = joblib.load(paths["label_encoder"]) |
|
|
print("Label encoder loaded.") |
|
|
except Exception as e: |
|
|
print(f"Failed to load components: {e}") |
|
|
raise |
|
|
|
|
|
print("All components loaded successfully.") |
|
|
|
|
|
def classify(self, text_input): |
|
|
data = " ".join(text_input) |
|
|
predicted_class = self.model.predict([data]) |
|
|
predicted_label = self.label_encoder.inverse_transform(predicted_class) |
|
|
return predicted_label |