mboullier commited on
Commit
aa57cfa
·
verified ·
1 Parent(s): 4592ae7

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ wine_clusters.png filter=lfs diff=lfs merge=lfs -text
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Model Card
3
+
4
+ ## Model Card Authors
5
+ Mathew
6
+
7
+ ## Model Description
8
+ This is a KMeans clustering model trained on the UCI Wine dataset. The model groups wines into clusters based on 13 chemical analysis features such as alcohol, flavanoids, color intensity, and proline. The dataset has three ground truth classes (wine cultivars; simply called ``classes`` in the dataset), which were used to evaluate clustering performance but not during training. The used K-value was K=3, for 3 different classes.
9
+
10
+ ## Intended Uses & Limitations
11
+ This clustering model is for educational purposes only. It is not suitable for production use because the dataset is relatively small (178 samples) and well-structured, which makes clustering easier than in more complex, real-world datasets. Results should not be generalized beyond this dataset.
12
+
13
+ ## Training Data
14
+ Data source: UCI Wine dataset (https://archive.ics.uci.edu/dataset/109/wine). The dataset contains 178 wines described by 13 continuous chemical features. Ground truth labels (three ``classes``) were used only for evaluation.
15
+
16
+ ## Evaluation Metrics
17
+ - Adjusted Rand Index (ARI): 0.849
18
+ - Normalized Mutual Information (NMI): 0.82
19
+
20
+ ## Ethical Considerations
21
+ Clustering models can reveal structure in data but should not be used for decision-making without careful validation. In domains like healthcare or finance, misinterpreting clusters as ground truth categories could lead to harmful conclusions. Here, the Wine dataset is safe for educational use, but the same methods applied to sensitive data would require rigorous ethical review.
22
+
23
+ ## Audit Questions
24
+ - How stable are the clusters across different random seeds or initialization methods?
25
+ - Do the clusters correspond meaningfully to the known wine cultivars?
26
+ - How do ARI and NMI compare to supervised classification accuracy?
27
+ - Are there features that dominate clustering outcomes (e.g., alcohol, flavanoids)?
28
+
29
+
30
+ ## Plots
31
+ ### Clusters vs Ground Truth (e.g., PCA projection)
32
+ ![Clusters vs Ground Truth](clusters_vs_ground_truth.png)
33
+
34
+ ### Silhouette Plot
35
+ ![Silhouette Plot](silhouette_plot.png)
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Model Card
3
+
4
+ ## Model Card Authors
5
+ Mathew
6
+
7
+ ## Model Description
8
+ This is a KMeans clustering model trained on the UCI Wine dataset. The model groups wines into clusters based on 13 chemical analysis features such as alcohol, flavanoids, color intensity, and proline. The dataset has three ground truth classes (wine cultivars; simply called ``classes`` in the dataset), which were used to evaluate clustering performance but not during training. The used K-value was K=3, for 3 different classes.
9
+
10
+ ## Intended Uses & Limitations
11
+ This clustering model is for educational purposes only. It is not suitable for production use because the dataset is relatively small (178 samples) and well-structured, which makes clustering easier than in more complex, real-world datasets. Results should not be generalized beyond this dataset.
12
+
13
+ ## Training Data
14
+ Data source: UCI Wine dataset (https://archive.ics.uci.edu/dataset/109/wine). The dataset contains 178 wines described by 13 continuous chemical features. Ground truth labels (three ``classes``) were used only for evaluation.
15
+
16
+ ## Evaluation Metrics
17
+ - Adjusted Rand Index (ARI): 0.849
18
+ - Normalized Mutual Information (NMI): 0.82
19
+
20
+ ## Ethical Considerations
21
+ Clustering models can reveal structure in data but should not be used for decision-making without careful validation. In domains like healthcare or finance, misinterpreting clusters as ground truth categories could lead to harmful conclusions. Here, the Wine dataset is safe for educational use, but the same methods applied to sensitive data would require rigorous ethical review.
22
+
23
+ ## Audit Questions
24
+ - How stable are the clusters across different random seeds or initialization methods?
25
+ - Do the clusters correspond meaningfully to the known wine cultivars?
26
+ - How do ARI and NMI compare to supervised classification accuracy?
27
+ - Are there features that dominate clustering outcomes (e.g., alcohol, flavanoids)?
28
+
29
+
30
+ ## Plots
31
+ ### Clusters vs Ground Truth (e.g., PCA projection)
32
+ ![Clusters vs Ground Truth](clusters_vs_ground_truth.png)
33
+
34
+ ### Silhouette Plot
35
+ ![Silhouette Plot](silhouette_plot.png)
config.json ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "sklearn": {
3
+ "columns": [
4
+ "Alcohol",
5
+ "Malic acid",
6
+ "Ash",
7
+ "Alcalinity of ash",
8
+ "Magnesium",
9
+ "Total phenols",
10
+ "Flavanoids",
11
+ "Nonflavanoid phenols",
12
+ "Proanthocyanins",
13
+ "Color intensity",
14
+ "Hue",
15
+ "OD280/OD315 of diluted wines",
16
+ "Proline"
17
+ ],
18
+ "environment": [
19
+ "scikit-learn=1.0.2"
20
+ ],
21
+ "example_input": {
22
+ "Alcohol": [
23
+ 14.23,
24
+ 13.2,
25
+ 13.16
26
+ ],
27
+ "Malic acid": [
28
+ 1.71,
29
+ 1.78,
30
+ 2.36
31
+ ],
32
+ "Ash": [
33
+ 2.43,
34
+ 2.14,
35
+ 2.67
36
+ ],
37
+ "Alcalinity of ash": [
38
+ 15.6,
39
+ 11.2,
40
+ 18.6
41
+ ],
42
+ "Magnesium": [
43
+ 127,
44
+ 100,
45
+ 101
46
+ ],
47
+ "Total phenols": [
48
+ 2.8,
49
+ 2.65,
50
+ 2.8
51
+ ],
52
+ "Flavanoids": [
53
+ 3.06,
54
+ 2.76,
55
+ 3.24
56
+ ],
57
+ "Nonflavanoid phenols": [
58
+ 0.28,
59
+ 0.26,
60
+ 0.3
61
+ ],
62
+ "Proanthocyanins": [
63
+ 2.29,
64
+ 1.28,
65
+ 2.81
66
+ ],
67
+ "Color intensity": [
68
+ 5.64,
69
+ 4.38,
70
+ 5.68
71
+ ],
72
+ "Hue": [
73
+ 1.04,
74
+ 1.05,
75
+ 1.03
76
+ ],
77
+ "OD280/OD315 of diluted wines": [
78
+ 3.92,
79
+ 3.4,
80
+ 3.17
81
+ ],
82
+ "Proline": [
83
+ 1065,
84
+ 1050,
85
+ 1185
86
+ ]
87
+ },
88
+ "model": {
89
+ "file": "model.pkl"
90
+ },
91
+ "task": "tabular-classification"
92
+ }
93
+ }
model.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d0a4a10b7eafe961bc278152391e0bef067307b37489acfc492d3eb033d8ea8
3
+ size 9688
wine_clusters.png ADDED

Git LFS Details

  • SHA256: 50f88eabc96c7ffe8a151e15b2e8cd8d0371ed9438bfe348fd745d2938b71f92
  • Pointer size: 131 Bytes
  • Size of remote file: 146 kB
wine_testing.csv ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ,Alcohol,Malicacid,Ash,Alcalinity_of_ash,Magnesium,Total_phenols,Flavanoids,Nonflavanoid_phenols,Proanthocyanins,Color_intensity,Hue,0D280_0D315_of_diluted_wines,Proline,class
2
+ 13,14.75,1.73,2.39,11.4,91,3.1,3.69,0.43,2.81,5.4,1.25,2.73,1150,1
3
+ 113,11.41,0.74,2.5,21.0,88,2.48,2.01,0.42,1.44,3.08,1.1,2.31,434,2
4
+ 21,12.93,3.8,2.65,18.6,102,2.41,2.41,0.25,1.98,4.5,1.03,3.52,770,1
5
+ 143,13.62,4.95,2.35,20.0,92,2.0,0.8,0.47,1.02,4.4,0.91,2.05,550,3
6
+ 173,13.71,5.65,2.45,20.5,95,1.68,0.61,0.52,1.06,7.7,0.64,1.74,740,3
7
+ 9,13.86,1.35,2.27,16.0,98,2.98,3.15,0.22,1.85,7.22,1.01,3.55,1045,1
8
+ 52,13.82,1.75,2.42,14.0,111,3.88,3.74,0.32,1.87,7.05,1.01,3.26,1190,1
9
+ 17,13.83,1.57,2.62,20.0,115,2.95,3.4,0.4,1.72,6.6,1.13,2.57,1130,1
10
+ 131,12.88,2.99,2.4,20.0,104,1.3,1.22,0.24,0.83,5.4,0.74,1.42,530,3
11
+ 2,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185,1
12
+ 125,12.07,2.16,2.17,21.0,85,2.6,2.65,0.37,1.35,2.76,0.86,3.28,378,2
13
+ 85,12.67,0.98,2.24,18.0,99,2.2,1.94,0.3,1.46,2.62,1.23,3.16,450,2
14
+ 150,13.5,3.12,2.62,24.0,123,1.4,1.57,0.22,1.25,8.6,0.59,1.3,500,3
15
+ 20,14.06,1.63,2.28,16.0,126,3.0,3.17,0.24,2.1,5.65,1.09,3.71,780,1
16
+ 54,13.74,1.67,2.25,16.4,118,2.6,2.9,0.21,1.62,5.85,0.92,3.2,1060,1
17
+ 164,13.78,2.76,2.3,22.0,90,1.35,0.68,0.41,1.03,9.58,0.7,1.68,615,3
18
+ 144,12.25,3.88,2.2,18.5,112,1.38,0.78,0.29,1.14,8.21,0.65,2.0,855,3
19
+ 19,13.64,3.1,2.56,15.2,116,2.7,3.03,0.17,1.66,5.1,0.96,3.36,845,1
20
+ 170,12.2,3.03,2.32,19.0,96,1.25,0.49,0.4,0.73,5.5,0.66,1.83,510,3
21
+ 45,14.21,4.04,2.44,18.9,111,2.85,2.65,0.3,1.25,5.24,0.87,3.33,1080,1
22
+ 42,13.88,1.89,2.59,15.0,101,3.25,3.56,0.17,1.7,5.43,0.88,3.56,1095,1
23
+ 154,12.58,1.29,2.1,20.0,103,1.48,0.58,0.53,1.4,7.6,0.58,1.55,640,3
24
+ 157,12.45,3.03,2.64,27.0,97,1.9,0.58,0.63,1.14,7.5,0.67,1.73,880,3
25
+ 114,12.08,1.39,2.5,22.5,84,2.56,2.29,0.43,1.04,2.9,0.93,3.19,385,2
26
+ 75,11.66,1.88,1.92,16.0,97,1.61,1.57,0.34,1.15,3.8,1.23,2.14,428,2
27
+ 101,12.6,1.34,1.9,18.5,88,1.45,1.36,0.29,1.35,2.45,1.04,2.77,562,2
28
+ 6,14.39,1.87,2.45,14.6,96,2.5,2.52,0.3,1.98,5.25,1.02,3.58,1290,1
29
+ 22,13.71,1.86,2.36,16.6,101,2.61,2.88,0.27,1.69,3.8,1.11,4.0,1035,1
30
+ 60,12.33,1.1,2.28,16.0,101,2.05,1.09,0.63,0.41,3.27,1.25,1.67,680,2
31
+ 40,13.56,1.71,2.31,16.2,117,3.15,3.29,0.34,2.34,6.13,0.95,3.38,795,1
32
+ 62,13.67,1.25,1.92,18.0,94,2.1,1.79,0.32,0.73,3.8,1.23,2.46,630,2
33
+ 68,13.34,0.94,2.36,17.0,110,2.53,1.3,0.55,0.42,3.17,1.02,1.93,750,2
34
+ 149,13.08,3.9,2.36,21.5,113,1.41,1.39,0.34,1.14,9.4,0.57,1.33,550,3
35
+ 106,12.25,1.73,2.12,19.0,80,1.65,2.03,0.37,1.63,3.4,1.0,3.17,510,2
36
+ 26,13.39,1.77,2.62,16.1,93,2.85,2.94,0.34,1.45,4.8,0.92,3.22,1195,1
37
+ 162,12.85,3.27,2.58,22.0,106,1.65,0.6,0.6,0.96,5.58,0.87,2.11,570,3