# 🍷 Wine Type Classifier

A **GradientBoostingClassifier** that predicts whether a wine is **red** or **white** based on its chemical properties.

## Model Details

- **Model type**: scikit-learn GradientBoostingClassifier
- **Task**: Binary classification (red vs white wine)
- **Dataset**: [mstz/wine](https://huggingface.co/datasets/mstz/wine)
- **Training samples**: 5,197
- **Test samples**: 1,300

## Performance

| Metric | Score |
|--------|-------|
| **Test Accuracy** | 99.23% |
| **Test F1 Score** | 99.49% |
| **Train Accuracy** | 100.0% |

### Per-class Performance (Test Set)

| Class | Precision | Recall | F1 |
|-------|-----------|--------|-----|
| Red Wine | 0.98 | 0.99 | 0.98 |
| White Wine | 1.00 | 0.99 | 0.99 |

## Features

The model uses 12 chemical properties as input features:

| Feature | Importance |
|---------|-----------|
| `total_sulfur_dioxide` | 58.06% |
| `chlorides` | 31.25% |
| `density` | 3.40% |
| `volatile_acidity` | 2.27% |
| `sulphates` | 1.38% |
| `fixed_acidity` | 0.85% |
| `residual_sugar` | 0.81% |
| `free_sulfur_dioxide` | 0.76% |
| `citric_acid` | 0.57% |
| `pH` | 0.34% |
| `alcohol` | 0.22% |
| `quality` | 0.10% |

## Usage

```python
import pickle
import numpy as np
from huggingface_hub import hf_hub_download

# Download and load model
model_path = hf_hub_download("victor/wine-type-classifier", "model.pkl")
with open(model_path, "rb") as f:
    model = pickle.load(f)

# Labels: 0 = Red Wine, 1 = White Wine
labels = {0: "Red Wine", 1: "White Wine"}

# Input features (in order):
# fixed_acidity, volatile_acidity, citric_acid, residual_sugar,
# chlorides, free_sulfur_dioxide, total_sulfur_dioxide,
# density, pH, sulphates, alcohol, quality

# Example: predict a red wine
sample = np.array([[7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4, 5]])
prediction = model.predict(sample)[0]
probabilities = model.predict_proba(sample)[0]

print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {max(probabilities):.2%}")
```

## Label Mapping

> ⚠️ **Note**: The `is_red` column in the source dataset is inverted relative to its name:
> - `is_red=0` → **Red Wine** (1,599 samples; high volatile acidity, low sulfur dioxide)
> - `is_red=1` → **White Wine** (4,898 samples; low volatile acidity, high sulfur dioxide)

## Training

```bash
pip install scikit-learn datasets huggingface_hub
python train_wine.py
```

## License

MIT