CuisineClassifier / README.md

Update README.md

8fd6a48 verified 3 months ago

5.05 kB

	---
	language:
	- en
	license: mit
	library_name: xgboost
	pipeline_tag: text-classification
	tags:
	- xgboost
	- multiclass
	- cuisine
	- region-classification
	- kaggle
	metrics:
	- accuracy
	- f1
	model-index:
	- name: CuisineClassifier
	results:
	- task:
	type: text-classification
	name: Cuisine (20 classes)
	dataset:
	name: What's Cooking? (Kaggle)
	type: whats-
	url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
	split: test
	metrics:
	- type: accuracy
	value: 0.77
	- type: f1
	value: 0.69
	- task:
	type: text-classification
	name: Region (5 classes)
	dataset:
	name: What's Cooking? (Kaggle) — aggregated to regions
	type: whats-cooking
	url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
	split: test
	metrics:
	- type: accuracy
	value: 0.89
	---

	# 🍽 Cuisine Classifier (XGBoost)
	This model classifies dishes based on their ingredients and assigns them either to a Cuisine (20 classes) or a Region (5 classes).
	It uses an XGBoost classifier trained on normalized ingredient data.

	---

	## 📊 Model Overview

	- Task: Multiclass Classification (Cuisines & Regions)
	- Input: List of ingredients (`["salt", "flour", "sugar", ...]`)
	- Output: Cuisine class (e.g. `"italian"`) or Region (e.g. `"Central Europe"`)
	- Algorithm: [XGBoost](https://xgboost.ai/)
	- Training Data: Kaggle [What’s Cooking?](https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset) dataset, ingredients normalized using AllRecipes dataset
	- Train/Test Split: 80 / 20, stratified
	- Cross Validation: 5-fold CV with `random_state=42`

	### 🌍 Region Mapping
	\| Region \| Cuisines \|
	\|-----------------\|-----------------------------------------------------------\|
	\| Central Europe \| british, french, greek, irish, italian, russian, spanish \|
	\| North America \| cajun_creole, southern_us \|
	\| Asia \| chinese, filipino, indian, japanese, korean, thai, vietnamese \|
	\| Middle East \| moroccan \|
	\| Latin America \| mexican, jamaican, brazilian \|



	---

	## 🧪 Performance

	### Model Comparison

	\| Metric \| Stratified Baseline \| Logistic Regression \| XGBoost \|
	\|-------\|----------------------\|---------------------\|---------\|
	\| Precision (20 cuisines) \| 0.05 \| 0.65 \| 0.75 \|
	\| Recall (20 cuisines) \| 0.05 \| 0.69 \| 0.66 \|
	\| Macro F1 (20 cuisines) \| 0.05 \| 0.67 \| 0.69 \|
	\| Accuracy (20 cuisines) \| 0.10 \| 0.75 \| 0.77 \|
	\| Accuracy (5 regions) \| 0.27 \| 0.89 \| 0.89 \|

	✅ Conclusion:
	XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.
	For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.

	---

	### Per-Region Metrics (5 Classes)

	\| Region \| Precision (XGB) \| Recall (XGB) \| F1 (XGB) \|
	\|-----------------\|------------------\|--------------\|----------\|
	\| Asia \| 0.94 \| 0.92 \| 0.93 \|
	\| Central Europe \| 0.85 \| 0.93 \| 0.89 \|
	\| Latin America \| 0.92 \| 0.88 \| 0.90 \|
	\| Middle East \| 0.88 \| 0.74 \| 0.81 \|
	\| North America \| 0.87 \| 0.76 \| 0.81 \|

	---

	## 🚀 How to Use

	```python
	from huggingface_hub import hf_hub_download
	import joblib

	class CuisineClassifier:

	def __init__(self, classifier="region"):
	print("Initializing CuisineClassifier...")

	components = ["cuisine_pipeline", "label_encoder"]
	paths = {}

	print("Downloading files from Hugging Face Hub...")
	for name in components:
	print(f"Downloading {name}.joblib ...")
	try:
	paths[name] = hf_hub_download(
	repo_id="NoahMeissner/CuisineClassifier",
	filename=f"region_classifier/{name}.joblib"
	if classifier == "cuisine":
	filename=f"cuisine_classifier/{name}.joblib"
	)
	print(f"{name} downloaded.")
	except Exception as e:
	print(f"Failed to download {name}: {e}")
	raise

	print("Loading model components with joblib...")
	try:
	self.model = joblib.load(paths["cuisine_pipeline"])
	print("Model loaded.")
	self.label_encoder = joblib.load(paths["label_encoder"])
	print("Label encoder loaded.")
	except Exception as e:
	print(f"Failed to load components: {e}")
	raise

	print("All components loaded successfully.")

	def classify(self, text_input):
	data = " ".join(text_input)
	predicted_class = self.model.predict([data])
	predicted_label = self.label_encoder.inverse_transform(predicted_class)
	return predicted_label