File size: 4,236 Bytes
9f07359
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
license: mit
language:
- en
metrics:
- r_squared
- accuracy
- mae
- mse
- f1
- recall
tags:
- machine-learning
- algorithms
- tabular-data
- knn
- python
- weighted-knn
- data-science
- preprocessing
---


SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API.


# Model Details


Model Description
SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability.

Developed by: Jashwanth Thatipamula  
Model type: Weighted KNN for tabular ML  
License: MIT  
Language(s): Not language-dependent (numerical tabular ML)  
Finetuned from model: Not applicable (original algorithm)

Model Sources
Repository: https://github.com/thatipamula-jashwanth/smart-knn  
Paper (DOI): https://doi.org/10.5281/zenodo.17713746  
Demo: Coming soon


# Uses


Direct Use
• Regression on tabular datasets  
• Classification on tabular datasets  
• Interpretable ML where feature importance matters  
• Real-world ML pipelines with missing values and noisy features

Downstream Use
• Research on distance-metric learning  
• Explainable ML baselines  
• AutoML components for tabular data

Out-of-Scope Use
• NLP, image or audio modelling  
• Deep learning / GPU models  
• Raw categorical datasets without encoding


# Bias, Risks, and Limitations

• Instance-based prediction can be slower than tree-based models on large datasets  
• Low performance on categorical-only datasets without encoding  
• Requires storing full training set for inference

Recommendations
Users should numerically encode categorical features before fitting SmartKNN.


# How to Get Started with the Model


pip install smart-knn

import pandas as pd
from smart_knn import SmartKNN

df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]

model = SmartKNN(k=5)
model.fit(X, y)

sample = X.iloc[0]
pred = model.predict(sample)
print(pred)


# Training Details


Training Data
SmartKNN is not pretrained and does not ship with training data; users train on their own dataset.

Preprocessing
Performed automatically:
• Normalization
• NaN / Inf cleaning
• Median imputation
• Outlier clipping
• Feature filtering via learned weights

Training Hyperparameters
• k = number of neighbors  
• weight_threshold = drop features below learned importance


# Evaluation
Testing Data
Evaluated across 35 regression and 20 classification public tabular datasets.

# Metrics
Regression: R², MSE  
Classification: Accuracy

# Results
• Regression: SmartKNN outperformed classical KNN on 90%+ datasets  
• Classification: SmartKNN beat classical KNN on 60% of datasets  

# Summary
SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity.


# Environmental Impact


SmartKNN requires no GPU and has minimal energy usage.
Hardware Type: CPU  
Hours used: Minimal  
Carbon Emitted: Negligible

# Technical Specifications


Model Architecture and Objective
• Instance-based learner  
• Weighted Euclidean distance metric  
• Learned feature weights (MSE + MI + Random Forest)

Compute Infrastructure
• Runs efficiently on CPU systems  
• Implemented using NumPy


# Citation


@software{smartknn2025,
  author       = {Jashwanth Thatipamula},
  title        = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17713746},
  url          = {https://doi.org/10.5281/zenodo.17713746}
}


# Model Card Authors

Jashwanth Thatipamula

Model Card Contact
Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn