victor
/

wine-type-classifier

Model card Files Files and versions

wine-type-classifier / README.md

victor's picture

victor HF Staff

Add comprehensive README with usage instructions

14a6827 verified 12 days ago

|

history blame contribute delete

2.42 kB

	# 🍷 Wine Type Classifier

	A GradientBoostingClassifier that predicts whether a wine is red or white based on its chemical properties.

	## Model Details

	- Model type: scikit-learn GradientBoostingClassifier
	- Task: Binary classification (red vs white wine)
	- Dataset: [mstz/wine](https://huggingface.co/datasets/mstz/wine)
	- Training samples: 5,197
	- Test samples: 1,300

	## Performance

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Test Accuracy \| 99.23% \|
	\| Test F1 Score \| 99.49% \|
	\| Train Accuracy \| 100.0% \|

	### Per-class Performance (Test Set)

	\| Class \| Precision \| Recall \| F1 \|
	\|-------\|-----------\|--------\|-----\|
	\| Red Wine \| 0.98 \| 0.99 \| 0.98 \|
	\| White Wine \| 1.00 \| 0.99 \| 0.99 \|

	## Features

	The model uses 12 chemical properties as input features:

	\| Feature \| Importance \|
	\|---------\|-----------\|
	\| `total_sulfur_dioxide` \| 58.06% \|
	\| `chlorides` \| 31.25% \|
	\| `density` \| 3.40% \|
	\| `volatile_acidity` \| 2.27% \|
	\| `sulphates` \| 1.38% \|
	\| `fixed_acidity` \| 0.85% \|
	\| `residual_sugar` \| 0.81% \|
	\| `free_sulfur_dioxide` \| 0.76% \|
	\| `citric_acid` \| 0.57% \|
	\| `pH` \| 0.34% \|
	\| `alcohol` \| 0.22% \|
	\| `quality` \| 0.10% \|

	## Usage

	```python
	import pickle
	import numpy as np
	from huggingface_hub import hf_hub_download

	# Download and load model
	model_path = hf_hub_download("victor/wine-type-classifier", "model.pkl")
	with open(model_path, "rb") as f:
	model = pickle.load(f)

	# Labels: 0 = Red Wine, 1 = White Wine
	labels = {0: "Red Wine", 1: "White Wine"}

	# Input features (in order):
	# fixed_acidity, volatile_acidity, citric_acid, residual_sugar,
	# chlorides, free_sulfur_dioxide, total_sulfur_dioxide,
	# density, pH, sulphates, alcohol, quality

	# Example: predict a red wine
	sample = np.array([[7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4, 5]])
	prediction = model.predict(sample)[0]
	probabilities = model.predict_proba(sample)[0]

	print(f"Prediction: {labels[prediction]}")
	print(f"Confidence: {max(probabilities):.2%}")
	```

	## Label Mapping

	> ⚠️ Note: The `is_red` column in the source dataset is inverted relative to its name:
	> - `is_red=0` → Red Wine (1,599 samples; high volatile acidity, low sulfur dioxide)
	> - `is_red=1` → White Wine (4,898 samples; low volatile acidity, high sulfur dioxide)

	## Training

	```bash
	pip install scikit-learn datasets huggingface_hub
	python train_wine.py
	```

	## License

	MIT