victor HF Staff commited on
Commit
14a6827
·
verified ·
1 Parent(s): e7b4c54

Add comprehensive README with usage instructions

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🍷 Wine Type Classifier
2
+
3
+ A **GradientBoostingClassifier** that predicts whether a wine is **red** or **white** based on its chemical properties.
4
+
5
+ ## Model Details
6
+
7
+ - **Model type**: scikit-learn GradientBoostingClassifier
8
+ - **Task**: Binary classification (red vs white wine)
9
+ - **Dataset**: [mstz/wine](https://huggingface.co/datasets/mstz/wine)
10
+ - **Training samples**: 5,197
11
+ - **Test samples**: 1,300
12
+
13
+ ## Performance
14
+
15
+ | Metric | Score |
16
+ |--------|-------|
17
+ | **Test Accuracy** | 99.23% |
18
+ | **Test F1 Score** | 99.49% |
19
+ | **Train Accuracy** | 100.0% |
20
+
21
+ ### Per-class Performance (Test Set)
22
+
23
+ | Class | Precision | Recall | F1 |
24
+ |-------|-----------|--------|-----|
25
+ | Red Wine | 0.98 | 0.99 | 0.98 |
26
+ | White Wine | 1.00 | 0.99 | 0.99 |
27
+
28
+ ## Features
29
+
30
+ The model uses 12 chemical properties as input features:
31
+
32
+ | Feature | Importance |
33
+ |---------|-----------|
34
+ | `total_sulfur_dioxide` | 58.06% |
35
+ | `chlorides` | 31.25% |
36
+ | `density` | 3.40% |
37
+ | `volatile_acidity` | 2.27% |
38
+ | `sulphates` | 1.38% |
39
+ | `fixed_acidity` | 0.85% |
40
+ | `residual_sugar` | 0.81% |
41
+ | `free_sulfur_dioxide` | 0.76% |
42
+ | `citric_acid` | 0.57% |
43
+ | `pH` | 0.34% |
44
+ | `alcohol` | 0.22% |
45
+ | `quality` | 0.10% |
46
+
47
+ ## Usage
48
+
49
+ ```python
50
+ import pickle
51
+ import numpy as np
52
+ from huggingface_hub import hf_hub_download
53
+
54
+ # Download and load model
55
+ model_path = hf_hub_download("victor/wine-type-classifier", "model.pkl")
56
+ with open(model_path, "rb") as f:
57
+ model = pickle.load(f)
58
+
59
+ # Labels: 0 = Red Wine, 1 = White Wine
60
+ labels = {0: "Red Wine", 1: "White Wine"}
61
+
62
+ # Input features (in order):
63
+ # fixed_acidity, volatile_acidity, citric_acid, residual_sugar,
64
+ # chlorides, free_sulfur_dioxide, total_sulfur_dioxide,
65
+ # density, pH, sulphates, alcohol, quality
66
+
67
+ # Example: predict a red wine
68
+ sample = np.array([[7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4, 5]])
69
+ prediction = model.predict(sample)[0]
70
+ probabilities = model.predict_proba(sample)[0]
71
+
72
+ print(f"Prediction: {labels[prediction]}")
73
+ print(f"Confidence: {max(probabilities):.2%}")
74
+ ```
75
+
76
+ ## Label Mapping
77
+
78
+ > ⚠️ **Note**: The `is_red` column in the source dataset is inverted relative to its name:
79
+ > - `is_red=0` → **Red Wine** (1,599 samples; high volatile acidity, low sulfur dioxide)
80
+ > - `is_red=1` → **White Wine** (4,898 samples; low volatile acidity, high sulfur dioxide)
81
+
82
+ ## Training
83
+
84
+ ```bash
85
+ pip install scikit-learn datasets huggingface_hub
86
+ python train_wine.py
87
+ ```
88
+
89
+ ## License
90
+
91
+ MIT