# 🍷 Wine Type Classifier A **GradientBoostingClassifier** that predicts whether a wine is **red** or **white** based on its chemical properties. ## Model Details - **Model type**: scikit-learn GradientBoostingClassifier - **Task**: Binary classification (red vs white wine) - **Dataset**: [mstz/wine](https://huggingface.co/datasets/mstz/wine) - **Training samples**: 5,197 - **Test samples**: 1,300 ## Performance | Metric | Score | |--------|-------| | **Test Accuracy** | 99.23% | | **Test F1 Score** | 99.49% | | **Train Accuracy** | 100.0% | ### Per-class Performance (Test Set) | Class | Precision | Recall | F1 | |-------|-----------|--------|-----| | Red Wine | 0.98 | 0.99 | 0.98 | | White Wine | 1.00 | 0.99 | 0.99 | ## Features The model uses 12 chemical properties as input features: | Feature | Importance | |---------|-----------| | `total_sulfur_dioxide` | 58.06% | | `chlorides` | 31.25% | | `density` | 3.40% | | `volatile_acidity` | 2.27% | | `sulphates` | 1.38% | | `fixed_acidity` | 0.85% | | `residual_sugar` | 0.81% | | `free_sulfur_dioxide` | 0.76% | | `citric_acid` | 0.57% | | `pH` | 0.34% | | `alcohol` | 0.22% | | `quality` | 0.10% | ## Usage ```python import pickle import numpy as np from huggingface_hub import hf_hub_download # Download and load model model_path = hf_hub_download("victor/wine-type-classifier", "model.pkl") with open(model_path, "rb") as f: model = pickle.load(f) # Labels: 0 = Red Wine, 1 = White Wine labels = {0: "Red Wine", 1: "White Wine"} # Input features (in order): # fixed_acidity, volatile_acidity, citric_acid, residual_sugar, # chlorides, free_sulfur_dioxide, total_sulfur_dioxide, # density, pH, sulphates, alcohol, quality # Example: predict a red wine sample = np.array([[7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4, 5]]) prediction = model.predict(sample)[0] probabilities = model.predict_proba(sample)[0] print(f"Prediction: {labels[prediction]}") print(f"Confidence: {max(probabilities):.2%}") ``` ## Label Mapping > ⚠️ **Note**: The `is_red` column in the source dataset is inverted relative to its name: > - `is_red=0` → **Red Wine** (1,599 samples; high volatile acidity, low sulfur dioxide) > - `is_red=1` → **White Wine** (4,898 samples; low volatile acidity, high sulfur dioxide) ## Training ```bash pip install scikit-learn datasets huggingface_hub python train_wine.py ``` ## License MIT