CuisineClassifier / README.md
NoahMeissner's picture
Update README.md
8fd6a48 verified
---
language:
- en
license: mit
library_name: xgboost
pipeline_tag: text-classification
tags:
- xgboost
- multiclass
- cuisine
- region-classification
- kaggle
metrics:
- accuracy
- f1
model-index:
- name: CuisineClassifier
results:
- task:
type: text-classification
name: Cuisine (20 classes)
dataset:
name: What's Cooking? (Kaggle)
type: whats-
url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
split: test
metrics:
- type: accuracy
value: 0.77
- type: f1
value: 0.69
- task:
type: text-classification
name: Region (5 classes)
dataset:
name: What's Cooking? (Kaggle) aggregated to regions
type: whats-cooking
url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
split: test
metrics:
- type: accuracy
value: 0.89
---
# 🍽 Cuisine Classifier (XGBoost)
This model classifies dishes based on their ingredients and assigns them either to a **Cuisine (20 classes)** or a **Region (5 classes)**.
It uses an **XGBoost classifier** trained on normalized ingredient data.
---
## 📊 Model Overview
- **Task**: Multiclass Classification (Cuisines & Regions)
- **Input**: List of ingredients (`["salt", "flour", "sugar", ...]`)
- **Output**: Cuisine class (e.g. `"italian"`) or Region (e.g. `"Central Europe"`)
- **Algorithm**: [XGBoost](https://xgboost.ai/)
- **Training Data**: Kaggle [*What’s Cooking?*](https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset) dataset, ingredients normalized using AllRecipes dataset
- **Train/Test Split**: 80 / 20, stratified
- **Cross Validation**: 5-fold CV with `random_state=42`
### 🌍 Region Mapping
| Region | Cuisines |
|-----------------|-----------------------------------------------------------|
| Central Europe | british, french, greek, irish, italian, russian, spanish |
| North America | cajun_creole, southern_us |
| Asia | chinese, filipino, indian, japanese, korean, thai, vietnamese |
| Middle East | moroccan |
| Latin America | mexican, jamaican, brazilian |
---
## 🧪 Performance
### Model Comparison
| Metric | Stratified Baseline | Logistic Regression | XGBoost |
|-------|----------------------|---------------------|---------|
| **Precision (20 cuisines)** | 0.05 | 0.65 | **0.75** |
| **Recall (20 cuisines)** | 0.05 | **0.69** | 0.66 |
| **Macro F1 (20 cuisines)** | 0.05 | 0.67 | **0.69** |
| **Accuracy (20 cuisines)** | 0.10 | 0.75 | **0.77** |
| **Accuracy (5 regions)** | 0.27 | **0.89** | **0.89** |
**Conclusion:**
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.
---
### Per-Region Metrics (5 Classes)
| Region | Precision (XGB) | Recall (XGB) | F1 (XGB) |
|-----------------|------------------|--------------|----------|
| Asia | 0.94 | 0.92 | 0.93 |
| Central Europe | 0.85 | **0.93** | 0.89 |
| Latin America | 0.92 | 0.88 | 0.90 |
| Middle East | **0.88** | 0.74 | 0.81 |
| North America | **0.87** | 0.76 | 0.81 |
---
## 🚀 How to Use
```python
from huggingface_hub import hf_hub_download
import joblib
class CuisineClassifier:
def __init__(self, classifier="region"):
print("Initializing CuisineClassifier...")
components = ["cuisine_pipeline", "label_encoder"]
paths = {}
print("Downloading files from Hugging Face Hub...")
for name in components:
print(f"Downloading {name}.joblib ...")
try:
paths[name] = hf_hub_download(
repo_id="NoahMeissner/CuisineClassifier",
filename=f"region_classifier/{name}.joblib"
if classifier == "cuisine":
filename=f"cuisine_classifier/{name}.joblib"
)
print(f"{name} downloaded.")
except Exception as e:
print(f"Failed to download {name}: {e}")
raise
print("Loading model components with joblib...")
try:
self.model = joblib.load(paths["cuisine_pipeline"])
print("Model loaded.")
self.label_encoder = joblib.load(paths["label_encoder"])
print("Label encoder loaded.")
except Exception as e:
print(f"Failed to load components: {e}")
raise
print("All components loaded successfully.")
def classify(self, text_input):
data = " ".join(text_input)
predicted_class = self.model.predict([data])
predicted_label = self.label_encoder.inverse_transform(predicted_class)
return predicted_label