JashuXo commited on
Commit
9f07359
·
0 Parent(s):

Duplicate from JashuXo/smart-knn

Browse files
Files changed (2) hide show
  1. .gitattributes +35 -0
  2. README.md +169 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - r_squared
7
+ - accuracy
8
+ - mae
9
+ - mse
10
+ - f1
11
+ - recall
12
+ tags:
13
+ - machine-learning
14
+ - algorithms
15
+ - tabular-data
16
+ - knn
17
+ - python
18
+ - weighted-knn
19
+ - data-science
20
+ - preprocessing
21
+ ---
22
+
23
+
24
+ SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API.
25
+
26
+
27
+ # Model Details
28
+
29
+
30
+ Model Description
31
+ SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability.
32
+
33
+ Developed by: Jashwanth Thatipamula
34
+ Model type: Weighted KNN for tabular ML
35
+ License: MIT
36
+ Language(s): Not language-dependent (numerical tabular ML)
37
+ Finetuned from model: Not applicable (original algorithm)
38
+
39
+ Model Sources
40
+ Repository: https://github.com/thatipamula-jashwanth/smart-knn
41
+ Paper (DOI): https://doi.org/10.5281/zenodo.17713746
42
+ Demo: Coming soon
43
+
44
+
45
+ # Uses
46
+
47
+
48
+ Direct Use
49
+ • Regression on tabular datasets
50
+ • Classification on tabular datasets
51
+ • Interpretable ML where feature importance matters
52
+ • Real-world ML pipelines with missing values and noisy features
53
+
54
+ Downstream Use
55
+ • Research on distance-metric learning
56
+ • Explainable ML baselines
57
+ • AutoML components for tabular data
58
+
59
+ Out-of-Scope Use
60
+ • NLP, image or audio modelling
61
+ • Deep learning / GPU models
62
+ • Raw categorical datasets without encoding
63
+
64
+
65
+ # Bias, Risks, and Limitations
66
+
67
+ • Instance-based prediction can be slower than tree-based models on large datasets
68
+ • Low performance on categorical-only datasets without encoding
69
+ • Requires storing full training set for inference
70
+
71
+ Recommendations
72
+ Users should numerically encode categorical features before fitting SmartKNN.
73
+
74
+
75
+ # How to Get Started with the Model
76
+
77
+
78
+ pip install smart-knn
79
+
80
+ import pandas as pd
81
+ from smart_knn import SmartKNN
82
+
83
+ df = pd.read_csv("data.csv")
84
+ X = df.drop("target", axis=1)
85
+ y = df["target"]
86
+
87
+ model = SmartKNN(k=5)
88
+ model.fit(X, y)
89
+
90
+ sample = X.iloc[0]
91
+ pred = model.predict(sample)
92
+ print(pred)
93
+
94
+
95
+ # Training Details
96
+
97
+
98
+ Training Data
99
+ SmartKNN is not pretrained and does not ship with training data; users train on their own dataset.
100
+
101
+ Preprocessing
102
+ Performed automatically:
103
+ • Normalization
104
+ • NaN / Inf cleaning
105
+ • Median imputation
106
+ • Outlier clipping
107
+ • Feature filtering via learned weights
108
+
109
+ Training Hyperparameters
110
+ • k = number of neighbors
111
+ • weight_threshold = drop features below learned importance
112
+
113
+
114
+ # Evaluation
115
+ Testing Data
116
+ Evaluated across 35 regression and 20 classification public tabular datasets.
117
+
118
+ # Metrics
119
+ Regression: R², MSE
120
+ Classification: Accuracy
121
+
122
+ # Results
123
+ • Regression: SmartKNN outperformed classical KNN on 90%+ datasets
124
+ • Classification: SmartKNN beat classical KNN on 60% of datasets
125
+
126
+ # Summary
127
+ SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity.
128
+
129
+
130
+ # Environmental Impact
131
+
132
+
133
+ SmartKNN requires no GPU and has minimal energy usage.
134
+ Hardware Type: CPU
135
+ Hours used: Minimal
136
+ Carbon Emitted: Negligible
137
+
138
+ # Technical Specifications
139
+
140
+
141
+ Model Architecture and Objective
142
+ • Instance-based learner
143
+ • Weighted Euclidean distance metric
144
+ • Learned feature weights (MSE + MI + Random Forest)
145
+
146
+ Compute Infrastructure
147
+ • Runs efficiently on CPU systems
148
+ • Implemented using NumPy
149
+
150
+
151
+ # Citation
152
+
153
+
154
+ @software{smartknn2025,
155
+ author = {Jashwanth Thatipamula},
156
+ title = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours},
157
+ year = {2025},
158
+ publisher = {Zenodo},
159
+ doi = {10.5281/zenodo.17713746},
160
+ url = {https://doi.org/10.5281/zenodo.17713746}
161
+ }
162
+
163
+
164
+ # Model Card Authors
165
+
166
+ Jashwanth Thatipamula
167
+
168
+ Model Card Contact
169
+ Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn