| # 🍷 Wine Type Classifier |
|
|
| A **GradientBoostingClassifier** that predicts whether a wine is **red** or **white** based on its chemical properties. |
|
|
| ## Model Details |
|
|
| - **Model type**: scikit-learn GradientBoostingClassifier |
| - **Task**: Binary classification (red vs white wine) |
| - **Dataset**: [mstz/wine](https://huggingface.co/datasets/mstz/wine) |
| - **Training samples**: 5,197 |
| - **Test samples**: 1,300 |
|
|
| ## Performance |
|
|
| | Metric | Score | |
| |--------|-------| |
| | **Test Accuracy** | 99.23% | |
| | **Test F1 Score** | 99.49% | |
| | **Train Accuracy** | 100.0% | |
|
|
| ### Per-class Performance (Test Set) |
|
|
| | Class | Precision | Recall | F1 | |
| |-------|-----------|--------|-----| |
| | Red Wine | 0.98 | 0.99 | 0.98 | |
| | White Wine | 1.00 | 0.99 | 0.99 | |
|
|
| ## Features |
|
|
| The model uses 12 chemical properties as input features: |
|
|
| | Feature | Importance | |
| |---------|-----------| |
| | `total_sulfur_dioxide` | 58.06% | |
| | `chlorides` | 31.25% | |
| | `density` | 3.40% | |
| | `volatile_acidity` | 2.27% | |
| | `sulphates` | 1.38% | |
| | `fixed_acidity` | 0.85% | |
| | `residual_sugar` | 0.81% | |
| | `free_sulfur_dioxide` | 0.76% | |
| | `citric_acid` | 0.57% | |
| | `pH` | 0.34% | |
| | `alcohol` | 0.22% | |
| | `quality` | 0.10% | |
|
|
| ## Usage |
|
|
| ```python |
| import pickle |
| import numpy as np |
| from huggingface_hub import hf_hub_download |
| |
| # Download and load model |
| model_path = hf_hub_download("victor/wine-type-classifier", "model.pkl") |
| with open(model_path, "rb") as f: |
| model = pickle.load(f) |
| |
| # Labels: 0 = Red Wine, 1 = White Wine |
| labels = {0: "Red Wine", 1: "White Wine"} |
| |
| # Input features (in order): |
| # fixed_acidity, volatile_acidity, citric_acid, residual_sugar, |
| # chlorides, free_sulfur_dioxide, total_sulfur_dioxide, |
| # density, pH, sulphates, alcohol, quality |
| |
| # Example: predict a red wine |
| sample = np.array([[7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4, 5]]) |
| prediction = model.predict(sample)[0] |
| probabilities = model.predict_proba(sample)[0] |
| |
| print(f"Prediction: {labels[prediction]}") |
| print(f"Confidence: {max(probabilities):.2%}") |
| ``` |
|
|
| ## Label Mapping |
|
|
| > ⚠️ **Note**: The `is_red` column in the source dataset is inverted relative to its name: |
| > - `is_red=0` → **Red Wine** (1,599 samples; high volatile acidity, low sulfur dioxide) |
| > - `is_red=1` → **White Wine** (4,898 samples; low volatile acidity, high sulfur dioxide) |
| |
| ## Training |
| |
| ```bash |
| pip install scikit-learn datasets huggingface_hub |
| python train_wine.py |
| ``` |
| |
| ## License |
| |
| MIT |
| |