File size: 4,236 Bytes
9f07359 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | ---
license: mit
language:
- en
metrics:
- r_squared
- accuracy
- mae
- mse
- f1
- recall
tags:
- machine-learning
- algorithms
- tabular-data
- knn
- python
- weighted-knn
- data-science
- preprocessing
---
SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API.
# Model Details
Model Description
SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability.
Developed by: Jashwanth Thatipamula
Model type: Weighted KNN for tabular ML
License: MIT
Language(s): Not language-dependent (numerical tabular ML)
Finetuned from model: Not applicable (original algorithm)
Model Sources
Repository: https://github.com/thatipamula-jashwanth/smart-knn
Paper (DOI): https://doi.org/10.5281/zenodo.17713746
Demo: Coming soon
# Uses
Direct Use
• Regression on tabular datasets
• Classification on tabular datasets
• Interpretable ML where feature importance matters
• Real-world ML pipelines with missing values and noisy features
Downstream Use
• Research on distance-metric learning
• Explainable ML baselines
• AutoML components for tabular data
Out-of-Scope Use
• NLP, image or audio modelling
• Deep learning / GPU models
• Raw categorical datasets without encoding
# Bias, Risks, and Limitations
• Instance-based prediction can be slower than tree-based models on large datasets
• Low performance on categorical-only datasets without encoding
• Requires storing full training set for inference
Recommendations
Users should numerically encode categorical features before fitting SmartKNN.
# How to Get Started with the Model
pip install smart-knn
import pandas as pd
from smart_knn import SmartKNN
df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]
model = SmartKNN(k=5)
model.fit(X, y)
sample = X.iloc[0]
pred = model.predict(sample)
print(pred)
# Training Details
Training Data
SmartKNN is not pretrained and does not ship with training data; users train on their own dataset.
Preprocessing
Performed automatically:
• Normalization
• NaN / Inf cleaning
• Median imputation
• Outlier clipping
• Feature filtering via learned weights
Training Hyperparameters
• k = number of neighbors
• weight_threshold = drop features below learned importance
# Evaluation
Testing Data
Evaluated across 35 regression and 20 classification public tabular datasets.
# Metrics
Regression: R², MSE
Classification: Accuracy
# Results
• Regression: SmartKNN outperformed classical KNN on 90%+ datasets
• Classification: SmartKNN beat classical KNN on 60% of datasets
# Summary
SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity.
# Environmental Impact
SmartKNN requires no GPU and has minimal energy usage.
Hardware Type: CPU
Hours used: Minimal
Carbon Emitted: Negligible
# Technical Specifications
Model Architecture and Objective
• Instance-based learner
• Weighted Euclidean distance metric
• Learned feature weights (MSE + MI + Random Forest)
Compute Infrastructure
• Runs efficiently on CPU systems
• Implemented using NumPy
# Citation
@software{smartknn2025,
author = {Jashwanth Thatipamula},
title = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17713746},
url = {https://doi.org/10.5281/zenodo.17713746}
}
# Model Card Authors
Jashwanth Thatipamula
Model Card Contact
Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn |